Memory, Caches, and CPUs — A Practical Mental Model

This article is written for beginner systems programmers. Its purpose is not to provide a checklist of optimizations, but to build a durable mental model — one you can return to whenever performance feels mysterious.

If you read this carefully, you should walk away with the ability to visualize memory, not just talk about it.

A Computer Is Not Smart — It Is Fast

Before we talk about memory, we need to clear a common misconception.

A computer is not intelligent. It does not understand variables, objects, or even numbers in the way humans do. What it does have is an extraordinary ability to perform very small, very simple operations at incredible speed.

Everything else — variables, structs, arrays, functions — is something we project onto the machine.

This matters because memory is the first place where this projection breaks down.

Memory as the CPU Sees It

At the hardware level, memory is best understood as a long strip of boxes, laid out one after another.

Each box:

has an address
holds exactly one byte

Nothing more.

You can imagine memory like this:

[ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ] ...
 0  1  2  3  4  5  6  7  8  9

Or even better like so

Memory Representation Visuals

Each position is a byte address.

The CPU does not know that four of these boxes together might represent an integer. It only knows how to read and write individual boxes.

This is byte-addressable memory.

How Meaning Appears: Types Are a Software Agreement

If memory is just bytes, where do types come from?

They come from agreement.

When you write:

x := int32(42)

You are really saying:

Please treat the next 4 bytes starting at address “0” as a 32-bit signed integer.

The compiler and CPU cooperate to uphold this agreement, but the memory itself is unaware of it.

If those same 4 bytes were interpreted as:

an integer
a float
part of a struct

memory would not notice.

Int32, Float32 and Boolean Represented in Memory

Contiguity: Why Arrays Feel Natural to Hardware

Now that we can visualize memory as a strip of bytes, arrays become easy to understand.

An array is simply a promise:

These values will live next to each other in memory.

If you have:

int arr[4];

And each int is 4 bytes, memory looks like:

[a][a][a][a][b][b][b][b][c][c][c][c][d][d][d][d]

This is not an abstraction. This is literally how the bytes are laid out.

Arrays are fast because they match the physical reality of memory.

Structs: Still Bytes, Just Grouped Differently

Structs feel more complex, but at the memory level they are still just contiguous bytes.

Consider:

type Struct struct {
    a int // 4-bytes
    b int // 4-bytes
    c byte // 1-byte
}

The compiler decides:

the order of fields
where padding is needed
how everything aligns

The CPU sees one thing:

[ ][ ][ ][ ][ ][ ][ ][ ][ ]

A single block of bytes. Struct Represented in Memory with Padding

Why Memory Needs Help: The Speed Gap

Here is the fundamental problem modern computers must solve:

CPUs are incredibly fast
Main memory is comparatively slow

If the CPU had to wait for main memory on every operation, most of its time would be spent idle.

The solution is the memory hierarchy.

The Memory Hierarchy as a Story of Distance

Think of the memory hierarchy as a story about distance.

The closer data is to the CPU, the faster it can be accessed — but the less of it there is.

From closest to farthest:

Registers
L1 Cache
L2 Cache
L3 Cache
Main Memory (RAM)

Each step away from the CPU increases:

access time
capacity

Memory Hierarchy

Cache Lines: How Memory Actually Moves

Here is the most important rule to internalize:

The CPU never fetches a single byte from memory.

Instead, memory moves in fixed-size chunks called cache lines.

On most modern systems:

one cache line = 64 bytes

When the CPU needs any byte from memory:

the entire 64-byte cache line containing that byte is fetched

One byte used in a cache line

Spatial Locality: Why Nearby Data Is Cheap

Cache lines make one thing cheap: nearby access.

If you access one element in an array, the next several elements are already in cache.

This is why code like this is fast:

for (int i = 0; i < n; i++) {
    sum += arr[i];
}

You are walking through memory in the same order it is laid out.

Stride: When You Stop Walking and Start Jumping

Stride describes how far you move through memory between accesses.

Stride = 1 means:

walk to the next element

Stride = 16 means:

jump ahead by 16 elements

When stride stays within a cache line, performance barely changes.

When stride jumps across cache lines, performance changes dramatically.

One byte used in a cache line

Alignment: When a Structure Fits Cleanly

Alignment is about where a structure starts in memory.

If a structure fits entirely inside one cache line:

it can be accessed with one memory fetch

Properly Aligned Struct Fits Cleanly on a Cache Line

If it crosses a cache line boundary:

it requires two fetches

Misaligned Struct requires 2 Memory Fetches

Padding: Wasting Space to Save Time

Padding exists to prevent misalignment.

By adding unused bytes, the compiler can ensure that frequently accessed data:

starts at a good boundary
stays within one cache line

This increases memory usage, but often improves performance.

Padding is not wasteful. It is intentional.

CPU Components That Matter for Memory

You do not need to understand every CPU component to reason about memory.

Only a few matter here:

Registers: where computation happens
Load/Store unit: moves data between memory and registers
Instruction fetch: instructions themselves pass through caches

Everything we discussed funnels through these components.

The Final Mental Model

If you remember nothing else, remember this:

Memory is bytes
Bytes move in cache lines
Performance is about movement, not computation

When performance feels mysterious, ask:

How is this data laid out?
How many cache lines am I touching?
Am I working with memory, or fighting it?

Once you can visualize memory, systems programming stops being magic.

It becomes engineering.