[Graphics]Optimization & Profiling Notes
keywords: Graphics, Optimization at Deep Level
Documents
Documents - Comprehensive
AMD RNDA Performance Guide (Recommended)
https://gpuopen.com/learn/rdna-performance-guide/
Intel® Processor Graphics Xᵉ-LP API Developer and Optimization Guide (Recommended)
https://www.intel.com/content/www/us/en/developer/articles/guide/lp-api-developer-optimization-guide.html
The Visual Technology of Gears 5 | Unreal Dev Days 2019 | Unreal Engine
https://www.youtube.com/watch?v=KL3rSV-IJ20
Improving visuals while achieving 60fps using Unreal Engine 4
The making of Gears 5: how the Coalition hit 60fps - and improved visual quality
https://www.eurogamer.net/articles/digitalfoundry-2019-gears-5-tech-interview
Gears 5 Graphics Options Performance Breakdown
https://www.game-debate.com/news/27684/gears-5-most-important-graphics-options-every-setting-benchmarked
Rendering of Call of Duty: Infinite Warfare
https://research.activision.com/publications/archives/rendering-of-call-of-dutyinfinite-warfare
Performance Tips when Writing Shaders
https://dev.rbcafe.com/unity/unity-5.3.3/en/Manual/SL-ShaderPerformance.html
AMD RDNA Performance Guide
https://gpuopen.com/learn/rdna-performance-guide/
GPU Optimization for GameDev (Recommended)
https://gist.github.com/silvesthu/505cf0cbf284bb4b971f6834b8fec93d
Nvidia Nsight Graphics: The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload (Recommended)
https://developer.nvidia.com/blog/the-peak-performance-analysis-method-for-optimizing-any-gpu-workload/
How Northlight makes Alan Wake 2 shine
https://www.remedygames.com/article/how-northlight-makes-alan-wake-2-shine
PowerVR Performance Recommendations - The Golden Rules
Documents - Drawcalls
The engine draws the things you see on the screen in batches and such a batch is called a drawcall.
The time a drawcall takes consists of two parts:
- a static overhead, that is always the same no matter how many things get drawn;
- a dynamic part that depends on how many things there are to draw;
One basic thing is that each model in idTech4 gets drawn once per light that hits that model. And each of these draws of the model is split into drawcalls for each material.
So if you have a model with two different materials that is hit by 3 lights, the entire model is drawn 3 times, and the engine uses 2 * 3 = 6 drawcalls to do so.
Reference: Drawcalls
https://wiki.thedarkmod.com/index.php?title=Drawcalls
Drawcalls = (Separate meshes X materials per Mesh)
https://forums.unrealengine.com/t/draw-call-discussion/12067/3
Documents - Shadow map (Non-dynamic lighting)
Q: Why enable shadow will increase triangles drawn and draw calls?
A: You first render the object from the perspective of the light to determine the distance(stored in shadow map). (Or multiple times if you have cascaded shadow maps.) Then you use this information during the normal rendering.
References:
Draw counts with hard shadows.
https://forum.unity.com/threads/draw-counts-with-hard-shadows.290706/
Deferred Shadow Mapping?
https://gamedev.net/forums/topic/647657-deferred-shadow-mapping/5093058/
Enabling shadows ads more tris/vertices in the rendering
https://forum.unity.com/threads/enabling-shadows-ads-more-tris-vertices-in-the-rendering.507325/
Documents - Unsynchronized Mapping
Approaching Zero Driver Overhead
https://www.reddit.com/r/gamedev/comments/21mbo8/we_are_the_authors_of_approaching_zero_driver/
GDC 2014: Approaching Zero Driver Overhead (AZDO) in OpenGL
https://www.youtube.com/watch?v=K70QbvzB6II
Best Practices for Working with Vertex Data
https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf
Map Buffer Range Super Slow?
https://www.gamedev.net/forums/topic/666461-map-buffer-range-super-slow/
Documents - Vulkan
Arm Best Practice warnings in the Vulkan SDK (Arm Mali GPUs)
https://community.arm.com/arm-community-blogs/b/graphics-gaming-and-vr-blog/posts/arm-best-practice-warnings-in-vulkan-sdk
GPU Rendering and Multi-Draw Indirect
https://github.com/KhronosGroup/Vulkan-Samples/blob/master/samples/performance/multi_draw_indirect/multi_draw_indirect_tutorial.md
Vulkan Samples of Performance
https://github.com/KhronosGroup/Vulkan-Samples/tree/master/samples/performance
Documents - Shader
Branches in mobile shaders
https://solidpixel.github.io/2021/12/09/branches_in_shaders.html
Quoted from Shader optimization - Reddit
It mostly depends on the number of vgpr’s (register) the shader uses. As the number of registers used in a shader increases the gpu has to reduce the number of warps/wavegrroups (warp) running in parallel. The number of registers you can use without reducing parallelism is hw dependent, but 16 is a useful metric. Then as you increase the register count for your shader it will slowly reduce the number that can run in parallel.
My advice is to use a hybrid approach, allow the size of your shader to increase until it starts to affect performance. A larger shader that is more flexibile will allow you to reduce the number state changes necessary to draw a scene. In general binding a new pipeline state object is more expensive than updating the constants to select new draw options.
Documents - Textures
Art production for games: Best practices and optimization
https://medium.com/ironsource-levelup/art-production-for-games-best-practices-and-optimization-5b651a167be8
Documents - Draw / Dispatch
The Draw/Dispatch/Copy/Present functions are the only ones that actually queue up GPU work. The other functions just configure what the next Draw/Dispatch will do.Origin:
https://gamedev.net/forums/topic/693698-drawindexed-and-drawindexedinstance-takes-very-long/5364051/
Papers
Papers - Ambient Occlusion Visualization
Ambient Occlusion Opacity Mapping for Visualization of Internal Molecular Structure
https://www.cs.unc.edu/~taylorr/Comp715/papers/AOOM_WSCG20111.pdf
Blogs
Blogs - Shader
The Shader Permutation Problem - Part 1: How Did We Get Here?
https://therealmjp.github.io/posts/shader-permutations-part1/
The Shader Permutation Problem - Part 2: How Do We Fix It?
https://therealmjp.github.io/posts/shader-permutations-part2/
Blogs - Texture
The top mip level of a texture is responsible for about 75% of the memory usage, so dropping one or two mip levels should be plenty of texture detail range for any game. For example starting with 2GB of textures and dropping two mip levels from all of them would take you down to 128MB of textures.
Also note that if you’re running out of texture memory D3DFMT_DXT1 is 8 times smaller than D3DFMT_X8R8G8B8 and should be less of a quality reduction than dropping a mip level.
Origin:
https://www.gamedev.net/forums/topic/605090-texture-quality-settings/
Blogs - GPU Memory (VRAM)
GPU Memory Pools in D3D12
https://therealmjp.github.io/posts/gpu-memory-pool/
Blogs - Frame Analysis
GTA V - Graphics Study
https://www.adriancourreges.com/blog/2015/11/02/gta-v-graphics-study/
Behind the Pretty Frames: God of War
http://www.mamoniem.com/behind-the-pretty-frames-god-of-war/
Blogs - Pipeline
PowerVR GPU架构与优化建议
https://yemi.me/2018/09/17/powervr-architecture-overview/
浅谈移动端GPU架构
https://zhuanlan.zhihu.com/p/656933750
mobile cpu上禁用alpha test的经验总结
https://gwb.tencent.com/community/detail/123042
渲染杂谈:early-z、z-culling、hi-z、z-perpass到底是什么?
https://juejin.cn/post/6844904132852072462
Alpha Testing vs. Depth Testing
https://www.gamedev.net/forums/topic/391007-alpha-testing-vs-depth-testing/
Early-Z和Late-Z规则
https://www.lfzxb.top/early-z-test-and-late-z-test/
Early-Z
https://developer.arm.com/documentation/102224/0200/Early-Z
Books
Books - Lighting Issues Visualization
Computer Graphics and Imaging
https://www.intechopen.com/books/7435
Computer Graphics and Imaging (October 23, 2019)
https://www.amazon.com/Computer-Graphics-Imaging-Branislav-Sobota/dp/1839622822
Tools & Frameworks
Tools - Graphical Debug
RAD Debugger. A native, user-mode, multi-process, graphical debugger. (Recommended)
https://github.com/EpicGames/raddebugger
Tools - Lighting Issues Visualization
Quoted from Dynamic analytic indirect light:
Triangulating the direct light enables analytic integrals for diffuse geometry with correct occlusion. The occlusion step uses the Sutherland–Hodgman algorithm to determine overlap between triangles. Complexity is O(n^4) and the shader is found here:
https://www.shadertoy.com/view/st3BW4
The faster version skips indirect occlusion and is O(n^3):
https://www.shadertoy.com/view/NlVfWy
Tools - GPU Crash Debugging
GPU Crash Debugging in Unreal Engine: Tools, Techniques, and Best Practices | Unreal Fest 2023 (Recommended)
https://www.youtube.com/watch?v=CyrGLMmVUAI
“The three most charismatic leaders in this century inflicted more suffering on the human race than almost any trio in history: Hitler, Stalin, and Mao. What matters is not the leader's charisma. What matters is the leader's mission.” ― Peter F. Drucker, Managing the Non-Profit Organization: Principles and Practices