keywords: OpenGL, buffer data, glBufferData, memory type

Memory Type
  • system memory: CPU-side memory, usually called RAM.
  • video memory: memory on video card (Mainstream is PCI Express), usually called VRAM.
  • AGP memory: Accelerated Graphics Port, the opposite of PCI Express, obsoleted connector on PC.
  • shared memory:
    • on integrated GPUs, shared memory is just the system memory;
    • take OpenGL for example, call glBufferStorage with GL_CLIENT_STORAGE_BIT to allocate CPU-side memory.
    • take D3D12 as an example, D3D12_HEAP_TYPE_CUSTOM heap is the shared memory (on NUMA platform set MemoryPoolPreferenceas D3D12_MEMORY_POOL_L0);
    • for Vulkan, call VmaAllocationCreateInfo() with VK_MEMORY_PROPERTY_HOST_CACHED_BIT which is set to VmaAllocationCreateInfo::preferredFlags.

System memory,AGP memory和video memory


Difference (credits to redditor: RowYourUpboat):

  • GL_STATIC_DRAW: basically means “I will load this vertex data once and then never change it.” This would include any static props or level geometry, but also animated models/particles if you are doing all the animation with vertex shaders on the GPU (modern engines with skeletal animation do this, for example).
  • GL_STREAM_DRAW: basically means “I am planning to change this vertex data basically every frame.” If you are manipulating the vertices a lot on the CPU, and it’s not feasible to use shaders instead, you probably want to use this one. Sprites or particles with complex behavior are often best served as STREAM vertices. While STATIC+shaders is preferable for animated geometry, modern hardware can spew incredible amounts of vertex data from the CPU to the GPU every frame without breaking a sweat, so you will generally not notice the performance impact.
  • GL_DYNAMIC_DRAW: basically means “I may need to occasionally update this vertex data, but not every frame.” This is the least common one. It’s not really suited for most forms of animation since those usually require very frequent updates. Animations where the vertex shader interpolates between occasional keyframe updates are one possible case. A game with Minecraft-style dynamic terrain might try using DYNAMIC, since the terrain changes occur less frequently than every frame. DYNAMIC also tends to be useful in more obscure scenarios, such as if you’re batching different chunks of model data in the same vertex buffer, and you occasionally need to move them around.

Quoted from glBufferData -

  • GL_STATIC_DRAW: The data store contents will be modified once and used many times as the source for GL drawing commands.
  • GL_DYNAMIC_DRAW: The data store contents will be modified repeatedly and used many times as the source for GL drawing commands.

OpenGL缓冲区对象允许usage标示符取以下9种值(credits to kunluo - CSDN):



  • “static"意味着buffer的数据不会被改变(一次修改,多次使用);
  • “dynamic"意味着数据可以被频繁修改(多次修改,多次使用);
  • “stream"意味着数据每帧都不同(一次修改,一次使用);
  • “Draw"意味着数据将会被送往GPU进行绘制,“read"意味着数据会被用户的应用读取,“copy"意味着数据会被用于绘制和读取。


How to choose between GL_STREAM_DRAW or GL_DYNAMIC_DRAW


Credits to redditor:

  • Use glBufferStorage with flags = 0 to allocate local GPU memory. You can then call glBufferStorage with GL_CLIENT_STORAGE_BIT to allocate CPU-side memory (with appropriate map read/write bits), and use glCopyBufferSubData to copy from one to the other. You can also copy from GPU buffer to GPU buffer, in case your data is generated by eg. a compute shader that writes to buffer.
  • If you use glBufferData, then what the implementation does with your memory is under-specified. Guaranteed it’s not doing anything like “avoiding a local CPU copy of your data”.
  • glMapBuffer does not allow you to directly modify GPU memory. It either causes the buffer to be stored on the CPU, or it’s giving you a temporary copy of the buffer (on the CPU) which it copies back to the GPU at the end. Also, if it did give you a direct GPU pointer, it would either hard sync the GPU, or force you to be very careful not to overwrite parts of the buffer currently being processed. If you’re explicitly implementing CPU/GPU copies yourself, then you have to implement this double-buffering yourself, which means having 1 GPU buffer and N CPU buffers (for N-buffering)

OpenGL Performance Tips: Atomic Counter Buffers versus Shader Storage Buffer Objects

glMapBuffer vs. glBufferSubData

Quoted from [Is it better glBufferSubData or glMapBuffer - StackOverflow]:
The good thing about glMapBuffer is that you dont need to copy the data first in an array and then use glBufferSubData to fill the opengl buffer. With glMapBuffer, you can copy the data directly to part of memory which opengl will fetch to GPU when it is necessary. From my point of view, there glMapBuffer should be faster when you want to fill a big buffer which is going to be updated frequently. Also, how you are copying the data into the buffer between glMapBuffer and glUnmapBuffer is also important.

If show us the code which you are using the glMapBuffer and how big is the data, then we can judge easier. Anyway, in the end measurements can show you which one is better.

Implicit synchronization -


This has mostly historic reasons. Back when there were no VBOs, the pointers specified with glVertexPointer and similar were not “associated” with a OpenGL object of any kind. When VBOs got introduced this behavior carried over into the semantics of VBOs, which required a different buffer target for indices and attributes.

With the introduction of generic vertex attributes such an association functionality has been added.

Today it’s mostly of a hint to the OpenGL implementation to know, in which way the data is going to be addressed, to optimize the data flow accordingly. But it also functions well as a mental reminder to the programmer, what’s currently dealt with.


How to upload vertices which addressed by VAO to GPU memory

Quoted from Mesh-Voxelization:

struct {
    Eigen::Matrix<float, -1, -1> V;		//vertex buffer
    Eigen::Matrix<uint32_t, -1, -1> F;	//triangle index buffer
} mesh;

struct {
    GLuint program;
    GLuint id_vao;
    GLuint id_vbo_position;
    GLuint id_ebo;
} vao;

void init_vao()
	/* interpolate the vertex (attributes and indices) into 'mesh'
	using geometry library such as Assimp or glTF. 
	load_mesh("C:/monkey.obj", mesh)

	GLuint shaders[3] = { vs, gs, fs };
	create_program(vao_voxelization.program, shaders, 3);
	glGenVertexArrays(1, &vao_voxelization.id_vao);

	//position (buffer object)
	glGenBuffers(1, &vao_voxelization.id_vbo_position);
	glBindBuffer(GL_ARRAY_BUFFER, vao_voxelization.id_vbo_position);
	glBufferData(GL_ARRAY_BUFFER, mesh.F.cols() * 3 * sizeof(GLfloat),, GL_STATIC_DRAW);
	glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0);
	glBindBuffer(GL_ARRAY_BUFFER, 0);

	//elements (buffer object)
	glGenBuffers(1, &vao_voxelization.id_ebo);
	glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, vao_voxelization.id_ebo);
	glBufferData(GL_ELEMENT_ARRAY_BUFFER, mesh.F.cols() * 3 * sizeof(GLuint),, GL_STATIC_DRAW);
VAO (Vertex Array Objects) vs VBO (Vertex Buffer Objects)


  • Vertex Array Objects (VAOs) are conceptually nothing but thin state wrappers.
  • Vertex Buffer Objects (VBOs) store actual data.

Origin: Use of Vertex Array Objects and Vertex Buffer Objects - StackOverflow

glActiveTexture vs. glBindTexture

Imagine the GPU like some paint processing plant.

There are a number of tanks, which delivers dye to some painting machine. In the painting machine the dye is then applied to the object. Those tanks are the texture units

Those tanks can be equipped with different kinds of dye. Each kind of dye requires some other kind of solvent. The “solvent” is the texture target. For convenience each tank is connected to some solvent supply, and but only one kind of solvent can be used at a time in each tank. So there’s a valve/switch TEXTURE_CUBE_MAP, TEXTURE_3D, TEXTURE_2D, TEXTURE_1D. You can fill all the dye types into the tank at the same time, but since only one kind of solvent goes in, it will “dilute” only the kind of dye matching. So you can have each kind of texture being bound, but the binding with the “most important” solvent will actually go into the tank and mix with the kind of dye it belongs to.

And then there’s the dye itself, which comes from a warehouse and is filled into the tank by “binding” it. That’s your texture.

Origin: Differences and relationship between glActiveTexture and glBindTexture

glBindImageTexture vs. glBindTexture
layout(binding = 0) uniform sampler2D img_input;

That declares a sampler, which gets its data from a texture object. The binding of 0 (you can set that in the shader in GLSL 4.20) says that the 2D texture bound to texture image unit 0 (via glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, ...);) is the texture that will be used for this sampler.

Samplers use the entire texture, including all mipmap levels and array layers. Most texture sampling functions use normalized texture coordinates ([0, 1] map to the size of the texture). Most texture sampling functions also respect filtering properties and other sampling parameters.

layout (binding = 0, rgba32f) uniform image2D img_input;

This declares an image, which represents a single image from a texture. Textures can have multiple images: mipmap levels, array layers, etc. When you use glBindImageTexture, you are binding a single image from a texture.

Images and samplers are completely separate. They have their own set of binding indices; it’s perfectly valid to bind a texture to GL_TEXTURE0 and an image from a different texture to image binding 0. Using texture functions for the associated sampler will read from what is bound to GL_TEXTURE0, while image functions on the associated image variable will read from the image bound to image binding 0.

Image access ignores all sampling parameters. Image accessing functions always use integer texel coordinates.

Samplers can only read data from textures; image variables can read and/or write data, as well as perform atomic operations on them. Of course, writing data from shaders requires special care, specifically when someone goes to read that data.

Origin: What is the difference between glBindImageTexture() and glBindTexture()?

The Common Usage of Uniform Buffer

Most commonly used scenarios:

UBO vs SSBO (Uniform Buffer Objects, Shader Storage Buffer Objects)


  • 大小限制:UBO大小限制为KB级别,64KB或128KB,具体与硬件相关。而SSBO则灵活得多,大小可为显存大小(GB),因而更适合与存储大量的数据。而且SSBO可以在shader运行时再确定大小,而非编译时。
  • 存取速度:UBO存储区为显卡的常量区,而SSBO为全局的显存,UBO速度高于SSBO。
  • 读写:UBO对于shader来讲是只读常量,只能通过外部程序更新。SSBO则是shader可以读写的(例如在compute shader中更新顶点的位置)。
  • SSBO还有一个特点,就是任何其他类型的OpenGL buffer都可绑定到GL_SHADER_STORAGE_BUFFER 目标上,因而很灵活。

RBO(Render Buffer Object)和texture在FBO(Frame Buffer Object)中都可以用于绑定相应的attachment,功能也比较相似。但是RBO效率更高,因为RBO中的数据采用OpenGL的内部格式,操作过程中无需转换格式。texture则必须转换为纹理颜色的格式。

Shader Storage Buffer Object

UBO vs TBO (Uniform Buffer Objects, Texture Buffer Objects)

Uniform Buffers VS Texture Buffers

PBO vs FBO (Pixel Buffer Objects, Frame Buffer Objects)

Quoted from: Pixel Buffer Object
PBOs have nothing to do with Framebuffer Objects. Note the capitalization; “framebuffer” is one word. FBOs are not buffer objects; PBOs are. FBOs are about rendering to off-screen images; PBOs are about pixel transfers to/from the user from/to images in OpenGL. They are not alike.

NDK OpenGL ES 3.0 开发(二十二):PBO
熟悉 OpenGL VAO、VBO、FBO、PBO 等对象,看这一篇就够了

He who thinks too much about every step he takes will always stay on one leg. -Chinese Proverbs