Theoretical Concepts of Latency in FAANG System Design
Objective of Latency in System Design
Latency measures how fast a system responds to a request. For FAANG-level systems, latency isn't just a number—it's a critical user experience (UX) and infrastructure design factor.
Key goals:
- Ensure fast, reliable, and consistent user responses.
- Satisfy SLAs/SLOs (e.g., 95% of requests < 100ms).
- Improve user retention and engagement.
- Support high throughput without degrading performance.
- Optimize cost-performance trade-offs with smart infra design.
Theoretical Foundations of Latency
1. Types of Latency
2. Latency vs Throughput
Latency: Time taken for a single request (e.g., 200ms for a photo upload)
- Throughput: Requests handled per second (e.g., 10K uploads/sec)
- Trade-off: Reducing latency might reduce throughput and vice versa. FAANG systems aim to optimize both, often with resource scaling and asynchronous processing.
3. Tail Latency: P95, P99, P999
Averages lie. FAANG engineers care about percentile latency:
P50 – Median user
P95 – 5% slowest users
P99 – 1% edge cases
P999 – Critical tail latency
Averages lie. FAANG engineers care about percentile latency:
P50 – Median user
P95 – 5% slowest users
P99 – 1% edge cases
P999 – Critical tail latency
A few slow services (e.g., DB shard or cache miss) can kill UX. Design for worst-case, not average.
4. Latency Budget Breakdown
Start with a goal like End-to-End Latency ≤ 200ms. Break it down:
Every component must respect its slice. Helps identify bottlenecks early.
5. Geo-Distribution & Latency
Latency increases with distance.
Fiber has ~5ms latency per 1,000 km.
Global services need:
CDNs (e.g., Cloudflare, Akamai)
Edge caching
Region-based failovers
Read replicas close to users
Latency increases with distance.
Fiber has ~5ms latency per 1,000 km.
Global services need:
CDNs (e.g., Cloudflare, Akamai)
Edge caching
Region-based failovers
Read replicas close to users
✨ Example: Instagram loads images via CDN, minimizing latency worldwide.
6. Latency Amortization
Techniques to reduce per-request latency:
Batching: Group requests (e.g., write logs in bulk)
Pipelining: Send multiple requests without waiting
Async processing: Offload heavy ops (e.g., send email after response)
Example: A user uploads a video → UI responds fast → transcoding runs async.
7. Caching to Reduce Latency
Latency reduction via caching layers:
Browser cache → client-side speed
CDN cache → static assets fast
Reverse proxy (e.g., NGINX) → dynamic caching
In-memory cache (Redis/Memcached) → avoid DB hits
Strategy: Cache early, invalidate smartly.
8. Backpressure, Rate Limiting & Circuit Breaking
Prevent systems from collapsing under load:
Rate limiting (e.g., 100 req/min)
Queue size limits
Circuit breakers (fail fast)
Retry logic with exponential backoff
Prevents cascading failures & keeps latency consistent under pressure.
9. Latency-Aware Data Modeling
Denormalization in NoSQL → reduce joins
Precomputed data → fast lookup
Search indexing (e.g., Elasticsearch)
Materialized views in SQL
Denormalization in NoSQL → reduce joins
Precomputed data → fast lookup
Search indexing (e.g., Elasticsearch)
Materialized views in SQL
Optimize schema for read latency, even if writes get a bit heavier.
10. Latency Observability and Monitoring
Track latency in real-time:
APM tools: Datadog, New Relic
Traces: Jaeger, Zipkin
Metrics: Prometheus + Grafana
Histograms: For percentile tracking
Monitor p95/p99 and correlate with spikes, cache misses, or backend failures.

