An Introduction to the New Generation of Graphic APIs (such as Vulkan and DirectX 12)
OpenGL 4.x and DirectX 11 have greatly improved functionality over the years since the earliest versions.
One of the most significant updates was probably the introduction of the programmable pipeline to replace the old fixed-function pipeline.
While the fixed-function pipeline required the application to define the state for every single pixel (color, texture, blending, etc.), the programmable pipeline uses shaders able to perform almost any kind of operation using built-in mathematical models:
- Transforming the matrix in the vertex shader
- Blending and coloring interpolation in the pixel shader
- Primitive subdividing in the tessellation shader
- Primitive transformation in the geometry shader
- And more
However, these APIs having been designed before multi-threading was widely available, applications are typically only able to use single CPU threads
to perform operations on the GPU.
Therefore, the more changes an application needs to make to the GPU states – along with validating said states and doing any extra work behind the scenes – the more the CPU bottlenecks, leaving the GPU idling.
This is why new generation APIs have been designed with CPU multi-threading capacity in mind.
New, multi-threading-friendly APIs allow applications to create and populate command buffers (objects that contain GPU rendering commands) from multiple CPU threads simultaneously, which are then queued up to the GPU by yet another CPU thread.
Once generated, command buffers can be executed at any time, with little CPU overhead and no need for the drivers to validate or compile anything inside the render loop. This reduces CPU usage compared to the older APIs, where GPU states have to be changed each time a different texture, blend mode or shader is used.
Example: Rendering 3 different meshes using different shaders and textures.
Prerequesite: data for the meshes, shaders and textures has been pregenerated/preloaded.
Initializing the graphics objects
- Old APIs
Graphics operations are executed sequentially.
The more buffers, effects or textures are created, the longer it takes to complete all the operations.
- New APIs
Each command buffer can be created independently of the others; only the Send operation requires all commands to be finalized.
The time to complete all the operations will be roughly equivalent to the sum of the lengthiest command buffer + the Send operation.
You can tell that initialization will probably be faster in the new APIs, since the workload is divided between all available CPU threads. Also, with the new APIs, CPU threads 0 to 2 do not deal with the GPU; they simply create buffers that store commands and memory pointers. Only CPU thread 3 actually communicates with the GPU, sending it commands for processing.
- Old APIs
- New APIs
As you can see, the render loop becomes simpler, having fewer operations to execute.
As a reminder, under the old APIs, the GPU state was changed for every binding operation, often leading to synchronous validation of the GPU and the driver, and ultimately to a loss of CPU and GPU cycles.
Finally, the new APIs require shaders to be pre-compiled in bytecode format. This might sound cumbersome, but it means that at runtime, shaders load faster – no compilation needed – and, better yet, are less prone to error, since they are pre-baked and pre-validated.