What is GPU virtualization?

As you well know, virtualization is done through a hypervisor that runs on the host machine and takes resources from it for the virtual machines. Also, you should know that there is a term known as GPU virtualization which is very important and it may not be what you are thinking of.

What is GPU virtualization?

As you know, virtual machines take hardware resources like vCPU, vRAM, etc., in order to use them. The vGPU or virtual GPU, is nothing more than a graphical virtualization solution that allows the virtual machine simultaneous access to one of the physical GPUs of the host.

GPU virtualization refers to a technology that enables the use of a GPU to accelerate graphics or GPGPU applications that are running in a virtual machine. And for this, various techniques are usually used such as device emulation, remote API, etc.

This can also bring great benefits by making the system more efficient, since hardware resources between different virtual machines can be better used. And it is that, with only the host system, if the workload is not maximum, all those resources would not really be taking advantage of. It is something that we already see in VPS.

When using GPU virtualization, for the virtual machine to make use of this hardware resource, CPU usage can also be reduced, since software graphics acceleration is not being used. In other words, with GPU virtualization it has been possible to do the same thing that was done with other resources, that is, to be able to use the physical GPU to render or perform other tasks for virtual machines. That is, we have a hardware graphics acceleration.

Advantages of GPU virtualization

GPU virtualization can not only improve efficiency and performance, but also has other advantages, especially for data centers or VPS-type servers. For example, the following should be highlighted:

  • Performance: GPU virtualization improves the performance of virtual machines, especially everything related to graphics, as well as speeding up tasks like AI, ML, GPGPU compute, etc. And not only that, by reducing the CPU workload by not having to use software acceleration, you also gain overall performance.
  • Reduces bottlenecks: As I have already mentioned, by reducing the workload that the CPU has to support, this will also allow the system to act more freely at peak workloads. Virtual machines will be able to better distribute the workload between the CPU and GPU for stronger performance.
  • Greater richness: By being able to use applications that rely on the capabilities of the GPU, this can improve the richness of the virtualized system, for example, being able to run games, AI software, GPU rendering, and even other tasks that can make use of the GPU, such as GPU decryption instead of slower CPU decryption.

Techniques used in GPU virtualization

As I mentioned earlier, there are several techniques used for GPU virtualization, and the most important ones are:

  • Remote API: It is a technique that allows communication between the graphical API for ekl call forwarding. That is, when any application is used in the virtual machine that makes use of graphics, the calls to the graphics API will be forwarded to be served by the graphics API of the host system and thus can be processed by the GPU. This technique is not perfect, as performance is compromised by such call forwarding, as well as preventing the virtual machine from being fully isolated from the host system. On the other hand, there is third-party software that can also add support for specific APIs, such as VMGL for OpenGL, rCUDA for CUDA, etc.
  • Fixed pass-through or GPU pass-through: This fixed pass-through is when a single virtual machine directly accesses a GPU exclusively and permanently. This technique greatly improves performance, achieving between 96 and 100% of native performance, and high fidelity. That is, it is greater than 86% apox. of performance achieved by the prior art. However, the disadvantage is that the GPU cannot be shared by several virtual machines, each one should have its additional physical GPU, and this increases the cost of the necessary hardware.
  • Mediated Pass-Through: This technique allows the GPU hardware to provide contexts with virtual memory ranges for each virtual machine through the IOMMU and the hypervisor will send the graphics commands from the virtual machines to the GPU. That is, it is like a kind of partitioning of the GPU resources so that they directly serve the virtual machines. This would be between the two previous techniques, without the need for API call forwarding, but allowing all virtual machines to use the same GPU. This is supported by technologies like NVIDIA vGPU, AMD MxGPU, Intel GVT-g, etc.
  • Device Emulation: As GPU architectures are very complex and constantly changing, as well as being top secret in many cases, it is not feasible to virtualize new generations of GPUs as complex by hypervisor software developers. It is only possible to do it with some older and simpler models, such as the 3dfx Voodoo2, the S3 ViRGE/DX, etc. This allows that if the virtual machine does not have 3D graphic acceleration, that it can at least have a minimum functionality to access these machines through a graphic terminal.