What is AVX-512?

ISA instruction extensions are common on many architectures. For the latest and greatest microprocessors in the x86 family, we have many different extensions, such as the 512-bit AVX-512 for data length. That is, instead of operating in 64-bit like the rest of the parts of the CPU, for these instructions a series of registers are grouped and treated in a special FPU.

Thanks to this, several operations can be carried out in a single sitting, with a single instruction, instead of having to operate in a scalar way, that is, data by data. With the AVX extensions you work with data vectors to which the same operation is applied to all of them. That is, it could be done on 8 64-bit data at the same time, or on 16 32-bit data, etc. However, although they can speed up many workloads, such as scientific ones, not all are advantages in these AVX.

What is AVX-512?

In addition to the main ISA itself, with the basic instructions of the AMD64 or EM64T or x86-64 architecture, whatever you want to call it, there are also many other extensions, that is, additional sets or sets of instructions that are added to complete the ISA and speed up certain workloads, for example, TensorFlow libraries can take advantage of them. Among them we have the AVX-512 instruction set.

It is the second iteration of AVX or AVX2. This instruction set came to Intel processors in 2013. And it stands for Advanced Vector Extensions. This repertoire would be incorporated for the first time in the Intel Xeon Phi (Knights Landing), and later it would also pass to the servers with the Intel Xeon (Skylake-X).

The main purpose of this instruction set was to speed up tasks related to data compression, image processing, and cryptographic calculations. Offering twice the computing power compared to AVX-256, the AVX-512 instruction set offered significant performance improvements, but despite adding twice the complexity, it did not deliver nearly twice the performance.

AVX-512 was both a good and a bad idea. Intel went ahead, since there was no software to justify its implementation on the client side, although it was for HPC. AMD was smarter in this regard and chose not to adopt the AVX-512 until there was more software that could take advantage of it, and that moment came in Zen 4, for the current Ryzen 7000 Series.

Intel for its part now seems somewhat lost, since it was the promoter of AVX-512 and has now blocked them for its Alder Lake onwards. It is true that the first Alder Lakes allowed AVX-512 processing on the Golden Cove-based P-cores, but not on the Gracemont-based E-cores. This was somewhat complex for the instruction scheduler, so Intel opted to disable them, even though the cores could physically use them.