Does CPU cache size matter?

There is always a lot of talk about the CPU cache, but not everyone knows what it really is and if it is necessary to have a minimum capacity for the system to run smoothly or to meet the requirements for gaming, etc. So in this article we’re going to learn more about this mysterious type of memory and why it’s so important when it comes to performance.

What does the word cache refer to?

Cache, or buffer, as it is known in some cases, is nothing more than an intermediate memory used to speed up access to information, generally located between two units. These memories are high-speed, although with reduced capacity given their high price.

When it comes to CPU cache, it is temporary memory that is placed near the processing cores to hold data and instructions so that the CPU has it more at hand, lowering the penalty in clock cycles for its access, as opposed to having to access that information in RAM or main memory.

This memory is usually of the SRAM type, and it is even faster than RAM memory, accessing it in a few nanoseconds. Hence, it allows to increase performance considerably, since if it does not exist, the CPU would have to access the data information and instructions of the processes to be executed in RAM memory, which means waiting for more clock cycles (greater latency).

Cache vs. RAM

As you should know, a program or software is nothing more than a set of data and instructions. Instructions are operations that are applied to data. For example, if the CPU needs to perform the operation 2 + 3, the ALU would take the operands or data, which in this case would be 2 and 3, and the control unit would send the signals for the ALU to go into add mode when it receives the ADD statement in this case.

Well then, for the CPU to be able to access that information, before the existence of the CPU cache, it had to be done in RAM memory, and since this memory is slower than the CPU, bottlenecks were formed. important. This is why the cache was created, to act as a buffer to speed up accesses with less clock cycle penalty, since the latency is much lower.

In fact, the cache is often up to 100 times faster than standard RAM. You may think that if it is so fast, why not increase its capacity and do without RAM, well, the reason is simple. This memory is very expensive to manufacture, so obtaining chips with high capacities would be prohibitively expensive for most users.

How cache affects performance

If you’re wondering how it makes the cache make processing faster, as you well know by now, this memory has very low latency, so it takes very few clock cycles to access it. This is how you gain performance instead of accessing other slower memories, wasting valuable clock cycles that will ultimately impact performance.

When the CPU requires a piece of data or an instruction to execute it, it will first look in the cache. If it’s there it hits, so you can access it very quickly. On the other hand, when the data is not found in the cache, a failure or miss occurs, which forces the CPU to look for it in a longer route, that is, with more latency. And that would imply accessing higher levels, RAM memory and even the hard disk in the worst case.

Cache levels

I would also not like to end this article without talking about the possible levels of CPU or GPU cache that there may be, since they are also important to understand what they mean when we analyze the technical specifications when we are going to buy a processor:

L1 or Level 1 cache

This L1 cache is the fastest of all. It is a small memory that is very close to the control and execution units, and the penalty in clock cycles to access it is minimal. In addition, in many current architectures there is a separate L1 for data and another L1 for instructions.

L2 or Level 2 cache

This level 2 cache is higher in capacity than L1, however, it usually penalizes more in terms of clock cycles to access it. However, it is still very fast, although not as fast as the L1. For this reason, the CPU will search first in L1 and if a hit occurs it will obtain what it is looking for quickly, and if a miss occurs it will have to search in this level. It must be said that this memory is usually unified, so it can store both data and instructions.

L3 or Level 3 cache

In the event of a miss also occurring in L2, the CPU will search the next level, L3. This memory is also unified, with data and instructions, and is of greater capacity than L2. However, as you can imagine, it is somewhat slower to access than L2.

That is, the access speed will be L1 > L2 > L3… Therefore, the CPU will begin to search in L1 and only if a miss occurs it will go to L2 and if a miss also occurs it will go to look for L3, and if there was a miss it would have to go to RAM memory, and so on. The next thing to RAM, if it is not there either, would be virtual memory, that is, accessing that part of the hard drive that is also used as an extension of RAM.

I would also like to point out that L3 is usually the last level (LLC or Last Level Cache), although there are systems that can even have L4, or levels lower than L1, such as L0. However, this is rarer…

On the other hand, the way in which the data and instructions of these cache memories are stored or disposed of is done through algorithms to always give priority to the information that is most likely to be accessed in the future or from the processes with the highest priority.