This article is written for beginner systems programmers. Its purpose is not to provide a checklist of optimizations, but to build a durable mental model — one you can return to whenever performance feels mysterious.
If you read this carefully, you should walk away with the ability to visualize memory, not just talk about it.
A Computer Is Not Smart — It Is Fast
Before we talk about memory, we need to clear a common misconception.
A computer is not intelligent. It does not understand variables, objects, or even numbers in the way humans do. What it does have is an extraordinary ability to perform very small, very simple operations at incredible speed.
Everything else — variables, structs, arrays, functions — is something we project onto the machine.
This matters because memory is the first place where this projection breaks down.
Memory as the CPU Sees It
At the hardware level, memory is best understood as a long strip of boxes, laid out one after another.
Each box:
-
has an address
-
holds exactly one byte
Nothing more.
You can imagine memory like this:
[ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ] ...
0 1 2 3 4 5 6 7 8 9
Or even better like so

Each position is a byte address.
The CPU does not know that four of these boxes together might represent an integer. It only knows how to read and write individual boxes.
This is byte-addressable memory.
How Meaning Appears: Types Are a Software Agreement
If memory is just bytes, where do types come from?
They come from agreement.
When you write:
x := int32(42)
You are really saying:
Please treat the next 4 bytes starting at address “0” as a 32-bit signed integer.
The compiler and CPU cooperate to uphold this agreement, but the memory itself is unaware of it.
If those same 4 bytes were interpreted as:
-
an integer
-
a float
-
part of a struct
memory would not notice.

Contiguity: Why Arrays Feel Natural to Hardware
Now that we can visualize memory as a strip of bytes, arrays become easy to understand.
An array is simply a promise:
These values will live next to each other in memory.
If you have:
int arr[4];
And each int is 4 bytes, memory looks like:
[a][a][a][a][b][b][b][b][c][c][c][c][d][d][d][d]
This is not an abstraction. This is literally how the bytes are laid out.
Arrays are fast because they match the physical reality of memory.
Structs: Still Bytes, Just Grouped Differently
Structs feel more complex, but at the memory level they are still just contiguous bytes.
Consider:
type Struct struct {
a int // 4-bytes
b int // 4-bytes
c byte // 1-byte
}
The compiler decides:
-
the order of fields
-
where padding is needed
-
how everything aligns
The CPU sees one thing:
[ ][ ][ ][ ][ ][ ][ ][ ][ ]
A single block of bytes.

Why Memory Needs Help: The Speed Gap
Here is the fundamental problem modern computers must solve:
-
CPUs are incredibly fast
-
Main memory is comparatively slow
If the CPU had to wait for main memory on every operation, most of its time would be spent idle.
The solution is the memory hierarchy.
The Memory Hierarchy as a Story of Distance
Think of the memory hierarchy as a story about distance.
The closer data is to the CPU, the faster it can be accessed — but the less of it there is.
From closest to farthest:
-
Registers
-
L1 Cache
-
L2 Cache
-
L3 Cache
-
Main Memory (RAM)
Each step away from the CPU increases:
-
access time
-
capacity

Cache Lines: How Memory Actually Moves
Here is the most important rule to internalize:
The CPU never fetches a single byte from memory.
Instead, memory moves in fixed-size chunks called cache lines.
On most modern systems:
- one cache line = 64 bytes
When the CPU needs any byte from memory:
- the entire 64-byte cache line containing that byte is fetched

Spatial Locality: Why Nearby Data Is Cheap
Cache lines make one thing cheap: nearby access.
If you access one element in an array, the next several elements are already in cache.
This is why code like this is fast:
for (int i = 0; i < n; i++) {
sum += arr[i];
}
You are walking through memory in the same order it is laid out.
Stride: When You Stop Walking and Start Jumping
Stride describes how far you move through memory between accesses.
Stride = 1 means:
- walk to the next element
Stride = 16 means:
- jump ahead by 16 elements
When stride stays within a cache line, performance barely changes.
When stride jumps across cache lines, performance changes dramatically.

Alignment: When a Structure Fits Cleanly
Alignment is about where a structure starts in memory.
If a structure fits entirely inside one cache line:
- it can be accessed with one memory fetch

If it crosses a cache line boundary:
- it requires two fetches

Padding: Wasting Space to Save Time
Padding exists to prevent misalignment.
By adding unused bytes, the compiler can ensure that frequently accessed data:
- starts at a good boundary
- stays within one cache line
This increases memory usage, but often improves performance.
Padding is not wasteful. It is intentional.
CPU Components That Matter for Memory
You do not need to understand every CPU component to reason about memory.
Only a few matter here:
-
Registers: where computation happens
-
Load/Store unit: moves data between memory and registers
-
Instruction fetch: instructions themselves pass through caches
Everything we discussed funnels through these components.
The Final Mental Model
If you remember nothing else, remember this:
-
Memory is bytes
-
Bytes move in cache lines
-
Performance is about movement, not computation
When performance feels mysterious, ask:
-
How is this data laid out?
-
How many cache lines am I touching?
-
Am I working with memory, or fighting it?
Once you can visualize memory, systems programming stops being magic.
It becomes engineering.
