System desing interview

Phase	What To Do	Key Questions To Ask	Output You Should Produce
1. Understand Problem	Restate the problem clearly	What exactly are we building? Who are users? What is the main goal?	Clear problem statement in 1–2 lines
2. Functional Requirements	Define what system must do	What are core features? MVP scope? CRUD only or real-time?	Bullet list of system capabilities
3. Non-Functional Requirements	Define system qualities	Expected QPS? Latency target? Availability (99.9%)? Strong or eventual consistency? Multi-region?	Performance + reliability constraints
4. Capacity Estimation	Rough calculations	How many users? Requests/sec? Read/write ratio? Storage growth?	Approx QPS, data size, bandwidth
5. Data Modeling	Identify entities & access patterns	What are core objects? How are they queried? Any hot keys?	Basic schema + indexing plan
6. High-Level Design	Draw minimal architecture	Client → LB → App → DB enough?	Simple architecture diagram
7. Identify Bottlenecks	Predict what breaks first	DB overload? CPU? Network? Cache miss storm?	Identified scaling pressure points
8. Scaling Strategy	Add components only if needed	Need caching? Read replicas? Sharding? Queue?	Evolved architecture
9. Concurrency Handling	Prevent race conditions	Will two users update same data? Need locking or optimistic control?	Isolation strategy
10. Consistency Model	Define data agreement level	Strong or eventual? Read-after-write needed?	Chosen consistency model
11. Failure Handling	Design for crashes & spikes	What if node fails? Network partition? Traffic spike?	Retry, replication, rate limiting
12. Resilience Patterns	Prevent cascading failures	Circuit breaker? Backpressure? Load shedding?	Stability mechanisms
13. Security	Protect system & data	Auth? Encryption? Data sensitivity?	Basic security model
14. Observability	Ensure visibility	How will we monitor errors, latency, replica lag?	Logging + metrics strategy
15. Trade-off Discussion	Show maturity	What do we gain? What do we sacrifice?	Explicit trade-offs
16. Deep Dive	Go deeper into one area	Database scaling? Cache invalidation? Partitioning?	Detailed explanation

Step 1. Outline Requirements

You begin any system design by describing what the system must do (functional requirements) and how well it must do it (non-functional requirements).

Functional Requirements (product requirement)

Think of this as the observable behavior of the system from the user or API perspective.
They describe functions, operations, and use-cases — not performance or scale.

Examples:

A user can upload photos.
A user can like or comment on a post.
The system should notify followers when someone posts.
The service exposes an API for retrieving user profiles.

These are features.
They define “what” the system does, not how well it does it.

Non-Functional Requirements How well the system must do it (technical goals)

These define system qualities or constraints.
They describe measurable attributes like performance, scale, latency, reliability, etc.

Examples:

The API should respond within 200 ms for 95% of requests.
The system should handle 1 million DAU (daily active users).
99.99% uptime (≈52 minutes downtime per year).
Data must be consistent across replicas within 1 second.
The system should recover from failure in < 5 minutes.

These are not features — they’re engineering targets.

If functional requirements tell you what to build, non-functional ones tell you how strong the system must be to support real-world use.

From those numbers, derive what must be true.

Example (Twitter):

Reads >> writes → system is read-heavy → caching becomes essential.
Latency < 200 ms → need CDNs or precomputed timelines.
10 M users → sharding or partitioning required.

This is where your system design logic emerges from requirements, not from memorized architectures.

Step 2. Outline Core Entities

Once requirements are clear, identify the main things (objects) your system needs to store or process usually these correspond to database tables or data models.

Example (for a ticket-booking system):

User
Event
Ticket
Payment

Each entity has relationships (e.g., one user can book many tickets, each ticket belongs to one event).

Step 3. Outline Basic APIs

Now that entities are known, you define how external clients (apps, users, services) will interact with your system.

Example:

POST /users → Create a user
GET /events → Fetch all events
POST /tickets → Book a ticket
GET /tickets/{id} → Get booking details

Step 4. Simple High-Level Design

What this step means

Before optimizing, draw a simple block diagram showing how the components interact — focusing only on correctness, not performance.

Example:

Client → API Gateway → Application Service → Database
Optional: add cache, message queue, or load balancer only if necessary to meet functionality.

Step 5. Deep Dives / Enhancement

After establishing a correct design, now refine it to meet the non-functional requirements (scalability, latency, consistency, fault tolerance).

Example enhancements:

Add caching (e.g., Redis) for low latency.
Add database sharding for scale.
Add replication for high availability.
Use message queues (Kafka) for async processing.

Focus on security also

Back-of-the-envelope calculations

Back-of-the-envelope calculations are quick, approximate estimations used in system design interviews (and engineering in general) to check whether an idea is feasible or scalable without doing detailed math or coding

When designing systems (say a chat app, a video streaming platform, or an AI API), you need to reason about scale — how many requests, how much data, how much bandwidth, etc.

But exact numbers are usually unknown or unnecessary.
So, you make order-of-magnitude estimates — simple enough to do mentally or on a whiteboard.

Example:

“If we have 10 million users, and each sends 10 messages a day, how much data do we store per day?”

That’s a back-of-the-envelope question.

Example

10 million daily active users
Each sends 20 messages/day
→ 200 million messages/day
Average message = 100 bytes (text only)
→ 200 million × 100 bytes = 20 GB/day
→ 20 GB × 365 ≈ 7.3 TB/year
→ ≈ 22 TB/year total storage

Category	Real meaning
Storage	How many bytes must we persist
Bandwidth	Bytes/second moving around the network
Memory	Bytes kept hot in RAM
Throughput	Requests/second system must support
Latency	Queuing + processing time

Everything must convert to per second. Why? Because machines operate per second.

So all interview calculations follow the same flow:

users → actions/day → actions/sec → data/action → bytes/sec

So the ONLY formula we truly need is:

Requests per second (RPS) =
    Total events per day
    -----------------------
         86,400

60 seconds × 60 minutes × 24 hours = 86,400 seconds/day

1 KB  = 10³ bytes
1 MB  = 10⁶ bytes
1 GB  = 10⁹ bytes
1 TB  = 10¹² bytes

1000 KB = 1 MB
1000 MB = 1 GB
1000 GB = 1 TB

In memory (approximate, for interviews):

Type	Size
1 character	1 byte
1 image (low quality)	~200 KB
1 image (average)	~500 KB – 1 MB
JSON object (typical)	1 – 5 KB
Video (1 min low quality)	5 – 10 MB
Video (HD 1 min)	30 – 60 MB

1) Daily Active Users (DAU)
2) Actions per user per day
3) Data per action (bytes)
4) Calculate:

Total bytes/day = DAU × Actions × Size

Total bytes/year = bytes/day × 365

Let DAU = 10M
Let actions = 20/day
Let data = 2 KB

Total/day = 10,000,000 × 20 × 2 KB
          = 400,000,000 KB
          = 400 GB per day

Per year ≈ 400 × 365
        ≈ 146,000 GB
        ≈ 146 TB/year


If 100 million events per day:

RPS = 100M / 86,400 ≈ 1157 req/sec

≈ 1200 RPS

API layers

Dimension / Need	REST + JSON	gRPC + Protobuf	GraphQL + JSON
Primary clients	Browsers, 3rd‑party partners	Designed for high-performance service-to-service RPC. Ideal for internal microservices and low-latency backend systems.	-Designed for client-driven data fetching. Great for front-end apps (web/mobile) needing flexible queries and avoiding over/under-fetching.
Transport	HTTP/1.1 or HTTP/2	Runs over HTTP/2 with built-in bi-directional streaming.	HTTP (usually POST)
Data format	JSON (text)	Protobuf (binary)	JSON (text)
Performance / latency	Good, but heavier than binary	Very high, 3–10× faster in many tests	Moderate; better than REST on over‑fetching
Caching	Native HTTP caching	Needs custom/HTTP‑level caching	Harder due to flexible queries
Streaming / realtime	Limited (SSE, websockets nearby)	First‑class streaming (client/server/bidi)	Possible but not the primary strength
Schema & typing	Optional (OpenAPI etc.)	Strong, enforced via .proto	Strong typed schema with introspection
Browser friendliness	Excellent	Needs proxy (gRPC‑Web)	Excellent
Typical best use	Public CRUD/web APIs	Internal RPC, high‑perf microservices	Aggregating backends, flexible client queries

For more refer Communication

System desing interview

Table of Contents

Step 1. Outline Requirements

Step 2. Outline Core Entities

Step 3. Outline Basic APIs

Step 4. Simple High-Level Design

What this step means

Step 5. Deep Dives / Enhancement

Back-of-the-envelope calculations

API layers

Graph View

Table of Contents