DRAM performance improve lot slower compared to CPU performance, as a result DRAM and CPU performance gap keep on increasing. Chances are that your CPU is already done it’s job and is waiting (idling) for data from DRAM to continue further, here the DRAM speed is bottelnecking CPU speed. To solve this, we put smaller but faster (also more costy) memory on CPU. Chunk(s) of memory (DRAM) is loaded into this, so that CPU can immediately access it. This memory is know as cache memory.
Memory hierarchy looks something like this:
| DRAM
Size and | L3 (shared between processes)
latency decrease | L2
v L1 (Data), L1 (Instruction)
Hierarchy is showen as formality, we don’t have to keep levels of cache is mind when programming
Associativity of cache is how copy of main-memory is mapped to cache. You can think of cache as an N-Byte 3D array, though the 2D array is laid linearly one after another instead of stack. This is how it look like:
{ // 3
{ // 2
{ a, b, c, d, e, f, g, h }, // 1
{ i, j, k, l, m, n, o, p }
},
{
{ q, r, s, t, u, v, w, x },
{ y, z, !, @, #, $, %, ^ }
}
}
Here the memory of cache’s block size is placed anywhere in the cache (This means there 0 set). For example, if you have memory like:
{ a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x }
If you have fully associative cache memory with 4 cache block and of size 4-bytes, then your cache memory might be like:
{
a, b, c, d,
i, j, k, l,
, , , , // Empty cache block
u, v, w, x
}
That is, the cache block is placed randomly (well not randomly, there are different types of policies for eviction and replacement for these cache block).
TBD
TBD
Agenda: