[Graphics]Optimization & Profiling Notes
keywords: Graphics, Optimization at Deep Level
Documents - Comprehensive
The Visual Technology of Gears 5 | Unreal Dev Days 2019 | Unreal Engine
Improving visuals while achieving 60fps using Unreal Engine 4
The making of Gears 5: how the Coalition hit 60fps - and improved visual quality
Gears 5 Graphics Options Performance Breakdown
Performance Tips when Writing Shaders
AMD RDNA Performance Guide
GPU Optimization for GameDev (Recommended)
Nvidia Nsight Graphics: The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload (Recommended)
Documents - Drawcalls
The engine draws the things you see on the screen in batches and such a batch is called a drawcall.
The time a drawcall takes consists of two parts:
- a static overhead, that is always the same no matter how many things get drawn;
- a dynamic part that depends on how many things there are to draw;
One basic thing is that each model in idTech4 gets drawn once per light that hits that model. And each of these draws of the model is split into drawcalls for each material.
So if you have a model with two different materials that is hit by 3 lights, the entire model is drawn 3 times, and the engine uses 2 * 3 = 6 drawcalls to do so.
Drawcalls = (Separate meshes X materials per Mesh)
Documents - Shadow map (Non-dynamic lighting)
Q: Why enable shadow will increase triangles drawn and draw calls?
A: You first render the object from the perspective of the light to determine the distance(stored in shadow map). (Or multiple times if you have cascaded shadow maps.) Then you use this information during the normal rendering.
Draw counts with hard shadows.
Deferred Shadow Mapping?
Enabling shadows ads more tris/vertices in the rendering
Documents - Unsynchronized Mapping
Approaching Zero Driver Overhead
GDC 2014: Approaching Zero Driver Overhead (AZDO) in OpenGL
Best Practices for Working with Vertex Data
Map Buffer Range Super Slow?
Documents - Vulkan
Arm Best Practice warnings in the Vulkan SDK (Arm Mali GPUs)
GPU Rendering and Multi-Draw Indirect
Vulkan Samples of Performance
Documents - Shader
Branches in mobile shaders
Quoted from Shader optimization - Reddit
It mostly depends on the number of vgpr’s (register) the shader uses. As the number of registers used in a shader increases the gpu has to reduce the number of warps/wavegrroups (warp) running in parallel. The number of registers you can use without reducing parallelism is hw dependent, but 16 is a useful metric. Then as you increase the register count for your shader it will slowly reduce the number that can run in parallel.
My advice is to use a hybrid approach, allow the size of your shader to increase until it starts to affect performance. A larger shader that is more flexibile will allow you to reduce the number state changes necessary to draw a scene. In general binding a new pipeline state object is more expensive than updating the constants to select new draw options.
Documents - Textures
Art production for games: Best practices and optimization
Documents - Draw / DispatchThe Draw/Dispatch/Copy/Present functions are the only ones that actually queue up GPU work. The other functions just configure what the next Draw/Dispatch will do.
Papers - Ambient Occlusion Visualization
Ambient Occlusion Opacity Mapping for Visualization of Internal Molecular Structure
Blogs - Shader
The Shader Permutation Problem - Part 1: How Did We Get Here?
The Shader Permutation Problem - Part 2: How Do We Fix It?
Blogs - Texture
The top mip level of a texture is responsible for about 75% of the memory usage, so dropping one or two mip levels should be plenty of texture detail range for any game. For example starting with 2GB of textures and dropping two mip levels from all of them would take you down to 128MB of textures.
Also note that if you’re running out of texture memory D3DFMT_DXT1 is 8 times smaller than D3DFMT_X8R8G8B8 and should be less of a quality reduction than dropping a mip level.
Blogs - GPU Memory (VRAM)
GPU Memory Pools in D3D12
Blogs - Frame Analysis
GTA V - Graphics Study
Behind the Pretty Frames: God of War
Books - Lighting Issues Visualization
Computer Graphics and Imaging
Computer Graphics and Imaging (October 23, 2019)
Tools & Frameworks
Tools - Lighting Issues Visualization
Quoted from Dynamic analytic indirect light:
Triangulating the direct light enables analytic integrals for diffuse geometry with correct occlusion. The occlusion step uses the Sutherland–Hodgman algorithm to determine overlap between triangles. Complexity is O(n^4) and the shader is found here:
The faster version skips indirect occlusion and is O(n^3):
“The three most charismatic leaders in this century inflicted more suffering on the human race than almost any trio in history: Hitler, Stalin, and Mao. What matters is not the leader's charisma. What matters is the leader's mission.” ― Peter F. Drucker, Managing the Non-Profit Organization: Principles and Practices