Don't Overthink it, Designing a Large Scale System is all about Reads and Write

Someone asked me recently: are you in tech for passion or for money?

I won’t answer that directly. But if you stick to the end of this article, you might figure it out yourself.

Whichever camp you’re in right now, you have to admit that the entire tech landscape has shifted under our feet. AI is not coming. It is already here, and the executives at big companies have found it to be a very convenient scapegoat for layoffs that were probably going to happen anyway.

But the layoffs are only half the story.

The other half is quieter and more important: companies have fundamentally changed how they hire. Back in 2020, the path was simple. Attend a six-month bootcamp, learn your MERN stack, sprinkle in some Vue or Angular, and you were one of the most hireable people in any room. If you were serious about your search, you could land a role in three months. With a strong network? One month.

Then ChatGPT launched in late 2022, and everything changed.

Since then, model after model has shipped with the same marketing strategy built on fear: “Our AI will replace software engineers in six months.” Some of those claims are overblown. Some are not. But none of that matters, because the hiring bar shifted regardless of whether the AI hype was true.

I wrote about this more directly on LinkedIn if you want my full take.

Here is what I know for certain: AI has been one of the greatest amplifiers I have encountered in my career. I use it constantly, but in a controlled, deliberate way, because the moment you let it do your thinking for you, your brain starts to quietly atrophy. The engineers who will thrive in the next decade are not the ones who resist AI, and they are not the ones who outsource everything to it either. They are the ones who use it to accelerate their thinking while continuing to build depth underneath.

Which brings me to what this article is actually about.

What Actually Separates Mid-Level Engineers From Senior and Staff Engineers

This article is the result of my recent deep dives across multiple areas: Platform Engineering, Systems Engineering, low-level computing, and first-principles reasoning about how computers actually work.

I wrote it because I kept noticing the same gap. Engineers who are technically solid struggle to make the leap to Senior or Staff level, not because they cannot code, but because they have never been given a stable mental model for reasoning about systems at scale.

The gap shows up most visibly in one place: the System Design interview.

If there are four or five interview rounds for a Senior or Staff role, at least one of them will be a System Design round. Sometimes it is the round that determines your title, your placement in the org, and your opening compensation.

Most senior engineers who have passed through any FAANG or MAANG-level process know this. The rest find out the hard way.

Here is the thing though: the System Design round should be the easiest round to pass.

I know that sounds strange. Let me explain.

Why People Struggle With System Design (And Why They Don’t Have To)

Data Structures and Algorithms rounds are hard in a specific, trainable way. You grind LeetCode. You master the patterns. Some people go as far as memorising the solutions. It is not elegant, but it works because the problem space is bounded.

System Design is different. It is an open-ended round with no fixed pattern, no mnemonic, and no stable mental model that most people are taught. You are handed a blank canvas and asked to design YouTube, or Instagram, or a URL shortener like bit.ly, or sometimes something entirely novel that you have never seen before.

I was once asked in an interview to design a logging and monitoring tool for enterprise companies to detect when their employees were leaking confidential information to LLM providers like ChatGPT and Claude. There was no textbook answer for that.

The reason people find System Design so hard is not a lack of knowledge. It is a lack of a reasoning framework. They have studied distributed systems in isolation, memorised CAP theorem, read about message queues and caches, but they have no unifying mental model that ties it all together.

I have figured out what that mental model is.

The Core Insight: Every System Is Just Reads and Writes

In System Design interviews, it is all about reads and writes.

That sounds reductive. Stay with me.

Every system you build has two lenses you need to look through: load parameters and performance metrics.

Load parameters describe the current stress on the system: requests per second, read/write ratios, concurrent active users. Performance metrics describe how the system responds to that stress. Understanding both is what separates engineers who slap a cache on a problem and hope for the best from engineers who know exactly which dial to turn and why.

Batch Processing: It Is About Throughput

Take a batch processing system. The metric you care about here is throughput: how much data the system processes, and in how much time.

When throughput is suffering, you trace it back to two things. How fast is data coming in? That is your read. How fast is the system producing output? That is your write. Every optimisation you make in a batch processing system is, at its core, an improvement to one of those two rates.

Web Servers: It Is About Latency

A web server asks a different question. Here you care about latency, and specifically the P99: what is the slowest one percent of your users experiencing right now?

When those numbers look bad, you start on the read side: add a cache, reduce payload size, add an index on the right database column. When the server is drowning in requests, you move to the write side: introduce a batching layer, add an idempotency mechanism so retries are safe, then fan out via a message queue to spread the load.

Storage Systems: The Read/Write Ratio Decides Everything

Storage systems make the read/write framing most explicit of all.

If your workload has an 80:20 read-to-write ratio, a relational database is almost always the right call. They are engineered from the ground up to make reads fast, built on B-Trees that are optimised for lookup performance.

Flip that to a 20:80 write-heavy workload and the picture changes. You need a database backed by an LSM-Tree rather than a B-Tree. Databases like Apache Cassandra, LevelDB, and ScyllaDB are built around LSM-Trees precisely because they make writes cheap by design, appending to an in-memory structure first and flushing to disk in bulk later. The tradeoff is that reads, especially for older data, get slower. That is not a bug. It is a conscious engineering decision, and understanding it is what lets you pick the right tool.

Going Deeper: Reads and Writes at the Systems Level

Most engineers stop the analysis at the application layer. The engineers who consistently impress in interviews and in real systems go one level lower.

I recorded a deeper breakdown of these concepts on YouTube if you prefer to watch rather than read.

Memory Allocators: `malloc` and `free`

When building a memory allocator, you think about fragmentation and allocation speed.

Reads are lookups: scanning a free list or searching a segregated fit table to find a block of the right size. Writes are the actual allocations and deallocations: placing metadata headers, updating pointers, and coalescing adjacent free blocks.

A naive first-fit allocator is slow on reads because it traverses the entire free list linearly. You optimise it by organising free blocks into size-class buckets, so a read for a 64-byte block goes directly to the 64-byte bin instead of scanning everything. Writes get expensive when you over-coalesce, because merging free blocks eagerly means every free() call triggers multiple pointer updates. Deferred coalescing, where you batch-merge only when a size class runs dry, shifts the write cost away from the hot path.

CPU Cache: Where Performance Lives and Dies

When reasoning about cache-friendly data structures, you think in terms of cache hits and misses.

A cache hit is a fast read: the data is already in L1 or L2. A cache miss is an expensive read that falls through to main memory, costing anywhere from 100 to 300 cycles.

Writes have their own asymmetry. A write-through cache flushes every write immediately to memory, keeping consistency but amplifying write traffic. A write-back cache buffers dirty writes in the cache line and flushes only on eviction, which reduces write traffic dramatically but complicates coherency in multi-core systems.

When you see a hot loop performing poorly, you look at the access pattern. Row-major versus column-major traversal of a matrix changes the read behaviour from sequential (cache-friendly) to strided (cache-hostile) without changing the logic at all.

Kernel Ring Buffers: Lock-Free by Design

When building a kernel ring buffer, the kind used in Linux perf, network drivers, and io_uring, you care about producer-consumer throughput with minimal contention.

Reads are consumer dequeues: checking the tail pointer, copying data out, and advancing the read index. Writes are producer enqueues: writing data to the head slot and advancing the write index.

The key insight is that with a single producer and single consumer, you can design the buffer lock-free by having each side own exactly one pointer. The producer owns head. The consumer owns tail. They never write to each other’s pointer. This turns what would be a mutex-guarded critical section into a pure memory ordering problem, solvable with acquire-release semantics.

When throughput is still not enough, you batch both sides. The producer writes 16 items before updating head once. The consumer drains 16 items before updating tail once. This reduces the number of cache-line bounces between cores significantly.

OS Process Scheduler: Reads Are Decisions, Writes Are Context Switches

When building or reasoning about a process scheduler, you think about CPU utilisation and scheduling latency.

Reads are the decisions the scheduler makes: scanning the run queue to find the next process to dispatch, checking priority levels, reading process state flags to determine whether a task is runnable or blocked. Writes are the context switches: saving the current process’s register state to its kernel stack, updating the process control block, loading the next process’s registers, and switching the memory map.

The read side is where scheduling algorithms differ. O(1) schedulers keep per-priority bitmap arrays so the “who runs next” lookup is a single bitmask operation rather than a queue scan. Linux’s CFS reads a red-black tree ordered by virtual runtime, making pick_next_task an O(log n) read but distributing CPU time far more fairly.

When the scheduler becomes the bottleneck on a high-core-count machine, the write side is usually the problem. Every context switch is a TLB flush and a cache cold-start, so the optimisation shifts toward reducing context switch frequency through CPU affinity pinning and work-stealing schedulers that avoid migrating tasks across cores.

Virtual Memory and the TLB: The Hidden Cost of Every Memory Access

Every memory access your program makes is a read through an indirection layer. The CPU must translate a virtual address to a physical one by walking the page table, which is itself a multi-level data structure sitting in memory.

The TLB (Translation Lookaside Buffer) short-circuits this: a TLB hit turns a 4-level page table walk, which is four memory reads, into a single-cycle lookup. A TLB miss is expensive not just because of CPU cycles, but because it forces reads of page table entries from memory, potentially crossing cache lines.

Writes in this system are page table management. When a process maps new memory with mmap, faults in a new page, or gets a page swapped back in, the kernel writes new entries into the page table and potentially invalidates TLB entries via an INVLPG instruction or a full TLB shootdown broadcast across all cores. In multi-core systems, that TLB shootdown causes every other core to pause and flush a cache it was actively using. This is why memory-mapping large files in tight loops is one of the more counterintuitive performance traps in systems programming.

Network Stack: Buffer Utilisation and the Cost of Copying Data

When optimising throughput at the socket layer, you think about buffer utilisation and copy cost.

Reads are the receiver side. Data arriving from the NIC lands in the kernel receive buffer, and the application calls recv() to read it out into userspace. The cost is not just the data copy. It is the syscall boundary crossing itself and the risk of the application reading too slowly, which causes the receive buffer to fill and forces TCP to shrink its advertised window, throttling the sender end-to-end.

Writes are the sender side. The application writes to the send buffer, the kernel segments it into TCP packets, and the NIC DMA-transfers those segments out. When write throughput is the bottleneck, you check whether the socket send buffer is sized correctly for the bandwidth-delay product of the link, and whether TCP_CORK or TCP_NODELAY is set appropriately. TCP_CORK batches small writes into full-sized segments, optimising write throughput. TCP_NODELAY disables Nagle’s algorithm to flush immediately, optimising write latency at the cost of more packets.

The furthest optimisation is io_uring and sendfile(), which removes the userspace buffer from the picture entirely, making a write that never touches your application’s address space.

Every system has its own vocabulary for what a read and a write mean. Once you start thinking in those terms, the optimisation path for almost any system becomes a lot less mysterious.

How to Use This in the Interview Room

In a System Design interview, you are typically asked to design a known system: YouTube, Instagram, a URL shortener. But as I mentioned, sometimes you get something entirely novel. The framework holds either way.

What most candidates miss is what the interviewer is actually evaluating. It is rarely just the architecture diagram. They are looking for evidence of five things:

Scale: Did your design account for 500 users or 10 million? Did it account for concurrent usage patterns? Do you understand where to look when a system is underperforming at scale?
Reliability: Did you think about how the system could fail and what happens when it does? A system that goes down the moment users need it is worthless.
Tradeoffs: Do you understand what you are willing to sacrifice given the type of system and the users you are designing for? Every distributed system is imperfect. The question is whether you know what to trade.
Databases: Do you understand what types of data you are producing, storing, and retrieving? Do you know how often and at what volume? Do you know which databases are optimised for that shape of data?
Collaboration: Can you explain your design to a teammate, justify your choices, and walk someone else through your reasoning clearly?

Reading that list probably triggered some cognitive overload. You might be thinking: “How do I design a system I have never worked on before?” That concern is completely valid. But here is the reframe: you are not being asked to have built the system. You are being asked to reason about it. And reasoning is exactly what this framework gives you.

This video shows you a step-by-step pattern for structuring your answer in the interview room here. You can also watch this live System Design Interview with one of my Favourite Creator and Software Engineer at Netflix Uma Abu to see how to approach System Design interviews here. These videos covers what to say, in what order, and how to frame your thinking.

How to Actually Impress Your Interviewer

Most candidates finish their first version of the design and stop. They have some boxes on the canvas, arrows pointing in various directions, and they wait for questions.

The engineers who impress do the opposite. They immediately start questioning their own design.

Before your interviewer asks anything, ask yourself:

What happens if 1,000,000 users initiate a write at the same time? Too many writes. How do you batch them?
How do I reduce the latency of this system to under 100ms? Read optimisation. What is the bottleneck?
Is there an opportunity for batch processing here? Optimising writes. Where does that make sense?
If I batch process, how do I prevent duplicate processing? Idempotency. This is both a write and a reliability concern.
If I cache at a certain layer, how do I prevent users from reading stale data? Optimising reads. Also a reliability concern.
Can I introduce a queue somewhere in this system? Async and concurrent processing. Optimising writes.

The more performance and scalability questions you ask yourself, the more clearly you see that it is all reads and writes, every single time.

If there are too many writes, you batch-process the write requests. If batching risks data corruption, you add an idempotency check.

If there are many reads and few writes, you cache frequently accessed data, reduce payload size to maximise network bandwidth, partition data based on your read pattern, or introduce replication to increase throughput.

If processing is too slow, you introduce concurrent or asynchronous processing to take the heavy computation off the request path entirely.

In the end, you are always doing the same thing: identifying where the bottleneck lives and optimising it. Too many writes. Too many reads. Too many network hops. Too many kernel boundary crossings. The specific vocabulary changes with the system. The pattern does not.

Most engineers find it difficult to see this because the tools, languages, and frameworks we work with every day are built on layers of abstraction that train us to focus on the surface. We end up memorising textbooks instead of recognising patterns.

One Last Thing

This article, and the talk I am building around it here, deliberately leaves out one dimension of real-world systems: MONEY.

In theory, you can see the tradeoffs clearly and design a solution to every bottleneck on paper. In practice, you are constrained by budget, by organisational politics, and by whether you are even the person with the authority to make the change. Every new tool you throw at a system has a cost attached to it, and that changes the calculus significantly. That is a subject for another article.

For now, I hope this framework has made the problem feel smaller and more approachable. You can walk into a System Design interview for any system in the world, including ones you have never worked on, and reason your way through it by asking one question: where are the reads and where are the writes?

If this article was useful, I would appreciate you sharing it with someone earlier in their career who needs to hear it.

You can find more of my writing at danielokoronkwo.com and follow along on LinkedIn, X, and Instagram. I am also building Corelens and Concile in the open if you want to see this kind of thinking applied to real systems. Checkout varsilias.dev where I document all Open Source tools I build.

Don't Overthink it, Designing a Large Scale System is all about Reads and Write

What Actually Separates Mid-Level Engineers From Senior and Staff Engineers

Why People Struggle With System Design (And Why They Don’t Have To)

The Core Insight: Every System Is Just Reads and Writes

Batch Processing: It Is About Throughput

Web Servers: It Is About Latency

Storage Systems: The Read/Write Ratio Decides Everything

Going Deeper: Reads and Writes at the Systems Level

Memory Allocators: `malloc` and `free`

CPU Cache: Where Performance Lives and Dies

Kernel Ring Buffers: Lock-Free by Design

OS Process Scheduler: Reads Are Decisions, Writes Are Context Switches

Virtual Memory and the TLB: The Hidden Cost of Every Memory Access

Network Stack: Buffer Utilisation and the Cost of Copying Data

Every system has its own vocabulary for what a read and a write mean. Once you start thinking in those terms, the optimisation path for almost any system becomes a lot less mysterious.

How to Use This in the Interview Room

How to Actually Impress Your Interviewer

One Last Thing

More Articles from Daniel Okoronkwo

Corelens v1 Has Officially Launched: Logs, Metrics, and Traces Made Simpler

Kafka on Kubernetes: The Hard Parts Nobody Warns You About

What Actually Separates Mid-Level Engineers From Senior and Staff Engineers

Why People Struggle With System Design (And Why They Don’t Have To)

The Core Insight: Every System Is Just Reads and Writes

Batch Processing: It Is About Throughput

Web Servers: It Is About Latency

Storage Systems: The Read/Write Ratio Decides Everything

Going Deeper: Reads and Writes at the Systems Level

Memory Allocators: malloc and free

CPU Cache: Where Performance Lives and Dies

Kernel Ring Buffers: Lock-Free by Design

OS Process Scheduler: Reads Are Decisions, Writes Are Context Switches

Virtual Memory and the TLB: The Hidden Cost of Every Memory Access

Network Stack: Buffer Utilisation and the Cost of Copying Data

Every system has its own vocabulary for what a read and a write mean. Once you start thinking in those terms, the optimisation path for almost any system becomes a lot less mysterious.

How to Use This in the Interview Room

How to Actually Impress Your Interviewer

One Last Thing

More Articles from Daniel Okoronkwo

Corelens v1 Has Officially Launched: Logs, Metrics, and Traces Made Simpler

Kafka on Kubernetes: The Hard Parts Nobody Warns You About

Memory Allocators: `malloc` and `free`