keywords: Graphics, Optimization at Deep Level


Documents - Comprehensive

The Visual Technology of Gears 5 | Unreal Dev Days 2019 | Unreal Engine
Improving visuals while achieving 60fps using Unreal Engine 4

The making of Gears 5: how the Coalition hit 60fps - and improved visual quality

Gears 5 Graphics Options Performance Breakdown

Performance Tips when Writing Shaders

AMD RDNA Performance Guide

GPU Optimization for GameDev (Recommended)

Nvidia Nsight Graphics: The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload (Recommended)

Documents - Drawcalls

The engine draws the things you see on the screen in batches and such a batch is called a drawcall.

The time a drawcall takes consists of two parts:

  • a static overhead, that is always the same no matter how many things get drawn;
  • a dynamic part that depends on how many things there are to draw;

One basic thing is that each model in idTech4 gets drawn once per light that hits that model. And each of these draws of the model is split into drawcalls for each material.

So if you have a model with two different materials that is hit by 3 lights, the entire model is drawn 3 times, and the engine uses 2 * 3 = 6 drawcalls to do so.

Reference: Drawcalls

Drawcalls = (Separate meshes X materials per Mesh)

Draw call discussion for Unreal Engine

Documents - Shadow map (Non-dynamic lighting)

Q: Why enable shadow will increase triangles drawn and draw calls?
A: You first render the object from the perspective of the light to determine the distance(stored in shadow map). (Or multiple times if you have cascaded shadow maps.) Then you use this information during the normal rendering.

Draw counts with hard shadows.
Deferred Shadow Mapping?
Enabling shadows ads more tris/vertices in the rendering

Documents - Unsynchronized Mapping

Approaching Zero Driver Overhead
GDC 2014: Approaching Zero Driver Overhead (AZDO) in OpenGL
Best Practices for Working with Vertex Data
Map Buffer Range Super Slow?

Documents - Vulkan

Arm Best Practice warnings in the Vulkan SDK (Arm Mali GPUs)

GPU Rendering and Multi-Draw Indirect
Vulkan Samples of Performance

Documents - Shader

Branches in mobile shaders

Quoted from Shader optimization - Reddit It mostly depends on the number of vgpr’s (register) the shader uses. As the number of registers used in a shader increases the gpu has to reduce the number of warps/wavegrroups (warp) running in parallel. The number of registers you can use without reducing parallelism is hw dependent, but 16 is a useful metric. Then as you increase the register count for your shader it will slowly reduce the number that can run in parallel.
My advice is to use a hybrid approach, allow the size of your shader to increase until it starts to affect performance. A larger shader that is more flexibile will allow you to reduce the number state changes necessary to draw a scene. In general binding a new pipeline state object is more expensive than updating the constants to select new draw options.

Documents - Textures

Art production for games: Best practices and optimization

Documents - Draw / Dispatch
The Draw/Dispatch/Copy/Present functions are the only ones that actually queue up GPU work. The other functions just configure what the next Draw/Dispatch will do.



Papers - Ambient Occlusion Visualization

Ambient Occlusion Opacity Mapping for Visualization of Internal Molecular Structure


Blogs - Shader

The Shader Permutation Problem - Part 1: How Did We Get Here?
The Shader Permutation Problem - Part 2: How Do We Fix It?

Blogs - Texture

The top mip level of a texture is responsible for about 75% of the memory usage, so dropping one or two mip levels should be plenty of texture detail range for any game. For example starting with 2GB of textures and dropping two mip levels from all of them would take you down to 128MB of textures.

Also note that if you’re running out of texture memory D3DFMT_DXT1 is 8 times smaller than D3DFMT_X8R8G8B8 and should be less of a quality reduction than dropping a mip level.


Blogs - GPU Memory (VRAM)

GPU Memory Pools in D3D12

Blogs - Frame Analysis

GTA V - Graphics Study

Behind the Pretty Frames: God of War


Books - Lighting Issues Visualization

Computer Graphics and Imaging
Computer Graphics and Imaging (October 23, 2019)

Tools & Frameworks

Tools - Lighting Issues Visualization

Quoted from Dynamic analytic indirect light:
Triangulating the direct light enables analytic integrals for diffuse geometry with correct occlusion. The occlusion step uses the Sutherland–Hodgman algorithm to determine overlap between triangles. Complexity is O(n^4) and the shader is found here:
The faster version skips indirect occlusion and is O(n^3):

“The three most charismatic leaders in this century inflicted more suffering on the human race than almost any trio in history: Hitler, Stalin, and Mao. What matters is not the leader's charisma. What matters is the leader's mission.” ― Peter F. Drucker, Managing the Non-Profit Organization: Principles and Practices