Part 5 - Cache memory

Posted on Apr 13, 2023

(Last updated: May 26, 2024)

In this part we will cover one of the most important solutions in computer science, the cache memory, but also memory in general.

Memory types

Before we dive in - let’s quickly recap the different types of memory there are:

Static RAM (SRAM)
- 0.5 ns - 2.5 ns | $2000 - $5000 per GB
Dynamic RAM (DRAM)
- 50 ns - 70 ns | $20 - $75 per GB
Magnetic storage
- 5 ms - 20 ms | $0.20 - $2 per GB

So, therefore, an ideal memory would be one of the speed of an SRAM but at the cost of a hard drive.

Locality of reference

The locality of reference is the tendency that programs, when accessing memory, usually access memory addresses that is the same or very close to each other.

This is a very crucial tendency - if we only keep a subset of the data, memory and instructions, that are probable to appear, in a small memory - we can use that instead.

This is exactly what we can define as a cache memory. Just to clearly define the two different localities:

Spatial Locality
- A memory address that is adjacent
Temporal Locality
- The exact same address.

Cache memory

So, as we defined, the cache memory, is a small but fast memory that stores data you want to access often (and adjacent)

Let’s define some terminology: $$ h_x = \textbf{Hit rate} = \text{Percentage of hits in memory x} \newline (1 - h_x) = m_x = \textbf{Miss rate} = \text{Percentage of misses in memory x} $$

$$ T_x = \textbf{Hit time} = \text{Acces time for memory x} \newline (T_2 - T_1) = \textbf{Miss penalty} = \text{Miss penalty for a miss in memory 1} $$

With these defined we can finally define what we will use the most, Average Memory Access Time (AMAT): $$ \text{AMAT} = h_1\ T_1 + m_1\ T_2 = h_1\ T_1 + (1 - h_1)T_2 $$

Memory hierarchy

When dealing with memory, we’ll hear the word hierarchy - what this means is the different “levels”.

Our cache memory is the closest and fastest, after that we have our primary memory - and, in some cases we’ll even have a “secondary” memory (hard drive).

You may also see the word level been thrown around. Our cache memory is the L1 memory. The primary memory can be divided into bigger L2-Lx caches.

We’ll cover this in more detail later. But why this is the case - is just as before, due to the locality of reference.

If we keep our memory in “blocks” we can easily keep track of where “groups” of memory are located.

Cache implementation

So, in reality, we keep track of memory blocks in our caches.

Let’s say we have 128 bytes of memory, we divide our memory into block of 16 bytes, meaning 8 total blocks.

In our cache memory we only have space for four blocks. How can we store all 8?

Well, the most obvious answer is that we keep block 0 - 3 at block “place” 0 - 3. But what about 4 - 7?

We use modulo for that!

But then we need a way to differentiate block 0 with block 4 and so on?

We’ll keep a “tag block” to indicate what block that is currently stored!

A rule of thumb is that: $$ \frac{\text{Memory size}}{\text{Cache size}} = N \newline $$

$$ \text{Tag bits} = log_2(N) $$

So, therefore we have to always keep track of these four questions in mind:

Where shall the block be place?
How do we find a specific block
Which block shall be written when the place is needed
How do we handle writes ("store")?

So to solve these problems we have to:

Have some index bits that indicates what index we have to look at, along with the tag and offset bits.
If we miss:
- A block that already is stored at that position gets thrown out. If it is different from when it was read (dirty) - we need to write it back to the next level.
- The searched block gets read into the cache and gets stored at the address index, and the required word gets delivered to the CPU.
- This, not surprisingly, takes a lot of time.

Mapping types

We’ve lightly covered one type of mapping for caches - the direct mapping.

Index in cache = (Block address) mod (Cache size)

But we also have associative caches.

Each index points at a set of blocks spaces (setindex).
A block can be placed anywhere within a set.
More blocks with the same index can now be in the cache simultaneously.
Set size = associativity

Note: This is more costly than direct mapping!

So, a two-way set associative cache is equivalent to two parallel direct mapped caches.

Handle writes in caches

We have two main strageties for this:

Write back (aka copy back)
- Write only in cache
- When a block needs to be thrown out, we write it (copy) to primary memory *only if it has been updated (this requires one extra “dirty” bit).
Write through
- The block writes to PM as well (normally via a so-called “Store buffer”).

But what do we do when we have a so-called “store miss”?

For a write-back cache:
- Read in the block to the cache.
For write-through cache:
- Two strategies:
  - “Allocate on miss”: Read in the block to the cache.
  - “Write around”: Do not read in the block (Since, programs often write whole blocks before they get read, for example when initializing a program).

Cache misses affect CPI

Let’s look at an example:

Given:

Instruction cache miss rate = 2%
Data cache miss rate = 4%
Miss penalty = 100 clock cycles
Base CPI = 2
Loads & stores take 36% of total instructions

We can calculate:

Miss cycles per instruction ($\Delta CPI_{cache}$)
- I-cache misses: $\Delta CPI_{instruction} = 0.02 * 100 = 2$
- D-cache misses: $\Delta CPI_{data} = 0.36 * 0.04 * 100 = 1.44$

Therefore, our total CPI is = 2 + 2 + 1.44 = 5.44!

Multiple-level cache

Primary cache (L1)
- Focuses on minimal hit time.
L2 Cache
- Focuses on low miss rate to avoid accessing PM.
Therefore:
- L1 caches are often very small when dealing with multiple levels of caches, compared to having single caches.
- L1’s block size is often smaller than L2’s.

Floating-point

$$ x = (-1)^S \cdot\ (1 + Fraction) \cdot\ 2^{Exponent - Bias} $$