[Graphics]Optimization & Profiling Notes

keywords: Graphics, Optimization at Deep Level

slogan_image

Documents

Documents - Comprehensive

AMD RNDA Performance Guide (Recommended)
https://gpuopen.com/learn/rdna-performance-guide/

Intel® Processor Graphics Xᵉ-LP API Developer and Optimization Guide (Recommended)
https://www.intel.com/content/www/us/en/developer/articles/guide/lp-api-developer-optimization-guide.html

The Visual Technology of Gears 5 | Unreal Dev Days 2019 | Unreal Engine
https://www.youtube.com/watch?v=KL3rSV-IJ20
Improving visuals while achieving 60fps using Unreal Engine 4

The making of Gears 5: how the Coalition hit 60fps - and improved visual quality
https://www.eurogamer.net/articles/digitalfoundry-2019-gears-5-tech-interview

Gears 5 Graphics Options Performance Breakdown
https://www.game-debate.com/news/27684/gears-5-most-important-graphics-options-every-setting-benchmarked

Rendering of Call of Duty: Infinite Warfare
https://research.activision.com/publications/archives/rendering-of-call-of-dutyinfinite-warfare

Performance Tips when Writing Shaders
https://dev.rbcafe.com/unity/unity-5.3.3/en/Manual/SL-ShaderPerformance.html

AMD RDNA Performance Guide
https://gpuopen.com/learn/rdna-performance-guide/

GPU Optimization for GameDev (Recommended)
https://gist.github.com/silvesthu/505cf0cbf284bb4b971f6834b8fec93d

Nvidia Nsight Graphics: The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload (Recommended)
https://developer.nvidia.com/blog/the-peak-performance-analysis-method-for-optimizing-any-gpu-workload/

How Northlight makes Alan Wake 2 shine
https://www.remedygames.com/article/how-northlight-makes-alan-wake-2-shine

PowerVR Performance Recommendations - The Golden Rules

Documents - Drawcalls

The engine draws the things you see on the screen in batches and such a batch is called a drawcall.

The time a drawcall takes consists of two parts:

a static overhead, that is always the same no matter how many things get drawn;
a dynamic part that depends on how many things there are to draw;

One basic thing is that each model in idTech4 gets drawn once per light that hits that model. And each of these draws of the model is split into drawcalls for each material.

So if you have a model with two different materials that is hit by 3 lights, the entire model is drawn 3 times, and the engine uses 2 * 3 = 6 drawcalls to do so.

Reference: Drawcalls
https://wiki.thedarkmod.com/index.php?title=Drawcalls

Drawcalls = (Separate meshes X materials per Mesh)

Draw call discussion for Unreal Engine
https://forums.unrealengine.com/t/draw-call-discussion/12067/3

Documents - Shadow map (Non-dynamic lighting)

Q: Why enable shadow will increase triangles drawn and draw calls?
A: You first render the object from the perspective of the light to determine the distance(stored in shadow map). (Or multiple times if you have cascaded shadow maps.) Then you use this information during the normal rendering.

References:
Draw counts with hard shadows.
https://forum.unity.com/threads/draw-counts-with-hard-shadows.290706/
Deferred Shadow Mapping?
https://gamedev.net/forums/topic/647657-deferred-shadow-mapping/5093058/
Enabling shadows ads more tris/vertices in the rendering
https://forum.unity.com/threads/enabling-shadows-ads-more-tris-vertices-in-the-rendering.507325/

Quoted from Shader optimization - Reddit
It mostly depends on the number of vgpr’s (register) the shader uses. As the number of registers used in a shader increases the gpu has to reduce the number of warps/wavegrroups (warp) running in parallel. The number of registers you can use without reducing parallelism is hw dependent, but 16 is a useful metric. Then as you increase the register count for your shader it will slowly reduce the number that can run in parallel.
My advice is to use a hybrid approach, allow the size of your shader to increase until it starts to affect performance. A larger shader that is more flexibile will allow you to reduce the number state changes necessary to draw a scene. In general binding a new pipeline state object is more expensive than updating the constants to select new draw options.

Documents - Textures

Art production for games: Best practices and optimization
https://medium.com/ironsource-levelup/art-production-for-games-best-practices-and-optimization-5b651a167be8

Documents - Draw / Dispatch

The Draw/Dispatch/Copy/Present functions are the only ones that actually queue up GPU work. The other functions just configure what the next Draw/Dispatch will do.

Origin:
https://gamedev.net/forums/topic/693698-drawindexed-and-drawindexedinstance-takes-very-long/5364051/

Papers

Papers - Ambient Occlusion Visualization

Ambient Occlusion Opacity Mapping for Visualization of Internal Molecular Structure
https://www.cs.unc.edu/~taylorr/Comp715/papers/AOOM_WSCG20111.pdf

Blogs

Blogs - Shader

The Shader Permutation Problem - Part 1: How Did We Get Here?
https://therealmjp.github.io/posts/shader-permutations-part1/
The Shader Permutation Problem - Part 2: How Do We Fix It?
https://therealmjp.github.io/posts/shader-permutations-part2/

Blogs - Texture

The top mip level of a texture is responsible for about 75% of the memory usage, so dropping one or two mip levels should be plenty of texture detail range for any game. For example starting with 2GB of textures and dropping two mip levels from all of them would take you down to 128MB of textures.

Also note that if you’re running out of texture memory D3DFMT_DXT1 is 8 times smaller than D3DFMT_X8R8G8B8 and should be less of a quality reduction than dropping a mip level.

Origin:
https://www.gamedev.net/forums/topic/605090-texture-quality-settings/

Blogs - GPU Memory (VRAM)

GPU Memory Pools in D3D12
https://therealmjp.github.io/posts/gpu-memory-pool/

Blogs - Frame Analysis

GTA V - Graphics Study
https://www.adriancourreges.com/blog/2015/11/02/gta-v-graphics-study/

Behind the Pretty Frames: God of War
http://www.mamoniem.com/behind-the-pretty-frames-god-of-war/

Blogs - Pipeline

PowerVR GPU架构与优化建议
https://yemi.me/2018/09/17/powervr-architecture-overview/

浅谈移动端GPU架构
https://zhuanlan.zhihu.com/p/656933750

mobile cpu上禁用alpha test的经验总结
https://gwb.tencent.com/community/detail/123042

渲染杂谈：early-z、z-culling、hi-z、z-perpass到底是什么？
https://juejin.cn/post/6844904132852072462

Alpha Testing vs. Depth Testing
https://www.gamedev.net/forums/topic/391007-alpha-testing-vs-depth-testing/

Early-Z和Late-Z规则
https://www.lfzxb.top/early-z-test-and-late-z-test/

Early-Z
https://developer.arm.com/documentation/102224/0200/Early-Z

Books

Books - Lighting Issues Visualization

Computer Graphics and Imaging
https://www.intechopen.com/books/7435
Computer Graphics and Imaging (October 23, 2019)
https://www.amazon.com/Computer-Graphics-Imaging-Branislav-Sobota/dp/1839622822

Tools & Frameworks

Tools - Graphical Debug

RAD Debugger. A native, user-mode, multi-process, graphical debugger. (Recommended)
https://github.com/EpicGames/raddebugger

Tools - Lighting Issues Visualization

Quoted from Dynamic analytic indirect light:
Triangulating the direct light enables analytic integrals for diffuse geometry with correct occlusion. The occlusion step uses the Sutherland–Hodgman algorithm to determine overlap between triangles. Complexity is O(n^4) and the shader is found here:
https://www.shadertoy.com/view/st3BW4
The faster version skips indirect occlusion and is O(n^3):
https://www.shadertoy.com/view/NlVfWy

Tools - GPU Crash Debugging

GPU Crash Debugging in Unreal Engine: Tools, Techniques, and Best Practices | Unreal Fest 2023 (Recommended)
https://www.youtube.com/watch?v=CyrGLMmVUAI

“The three most charismatic leaders in this century inflicted more suffering on the human race than almost any trio in history: Hitler, Stalin, and Mao. What matters is not the leader's charisma. What matters is the leader's mission.” ― Peter F. Drucker, Managing the Non-Profit Organization: Principles and Practices