Concepts of Latency

prakash singh
By -
0

Theoretical Concepts of Latency in FAANG System Design




Objective of Latency in System Design

Latency measures how fast a system responds to a request. For FAANG-level systems, latency isn't just a number—it's a critical user experience (UX) and infrastructure design factor.

Key goals:

  • Ensure fast, reliable, and consistent user responses.
  • Satisfy SLAs/SLOs (e.g., 95% of requests < 100ms).
  • Improve user retention and engagement.
  • Support high throughput without degrading performance.
  • Optimize cost-performance trade-offs with smart infra design.



Theoretical Foundations of Latency


1. Types of Latency

Type

What It Means

Example

Network latency

Time to transmit packets

RTT from user → CDN → server

Disk latency

Time to read/write to disk

Reading from HDD vs SSD

Queueing latency

Time waiting in queues

Requests piling in front of app server

Processing latency

Time to process request

API or DB logic execution

End-to-end latency

Total time from user click to response

Sum of all layers above


2.  Latency vs Throughput

Latency: Time taken for a single request (e.g., 200ms for a photo upload)

  • Throughput: Requests handled per second (e.g., 10K uploads/sec)
  • Trade-off: Reducing latency might reduce throughput and vice versa. FAANG systems aim to optimize both, often with resource scaling and asynchronous processing.


3.  Tail Latency: P95, P99, P999

  • Averages lie. FAANG engineers care about percentile latency:

    • P50 – Median user

    • P95 – 5% slowest users

    • P99 – 1% edge cases

    • P999 – Critical tail latency

A few slow services (e.g., DB shard or cache miss) can kill UX. Design for worst-case, not average.


4. Latency Budget Breakdown

Start with a goal like End-to-End Latency ≤ 200ms. Break it down:

Component

Budget (ms)

Frontend + CDN

30ms

Load Balancer

10ms

App Server Logic

40ms

DB Query

50ms

Cache Lookups

20ms

External API Call

30ms

Network Transfer

20ms

Total

200ms

Every component must respect its slice. Helps identify bottlenecks early.


5.  Geo-Distribution & Latency

  • Latency increases with distance.

  • Fiber has ~5ms latency per 1,000 km.

  • Global services need:

    • CDNs (e.g., Cloudflare, Akamai)

    • Edge caching

    • Region-based failovers

    • Read replicas close to users

✨ Example: Instagram loads images via CDN, minimizing latency worldwide.


6.  Latency Amortization

Techniques to reduce per-request latency:

  • Batching: Group requests (e.g., write logs in bulk)

  • Pipelining: Send multiple requests without waiting

  • Async processing: Offload heavy ops (e.g., send email after response)

Example: A user uploads a video → UI responds fast → transcoding runs async.


7.  Caching to Reduce Latency

Latency reduction via caching layers:

  • Browser cache → client-side speed

  • CDN cache → static assets fast

  • Reverse proxy (e.g., NGINX) → dynamic caching

  • In-memory cache (Redis/Memcached) → avoid DB hits

Strategy: Cache early, invalidate smartly.


8.  Backpressure, Rate Limiting & Circuit Breaking

Prevent systems from collapsing under load:

  • Rate limiting (e.g., 100 req/min)

  • Queue size limits

  • Circuit breakers (fail fast)

  • Retry logic with exponential backoff

Prevents cascading failures & keeps latency consistent under pressure.


9.  Latency-Aware Data Modeling

  • Denormalization in NoSQL → reduce joins

  • Precomputed data → fast lookup

  • Search indexing (e.g., Elasticsearch)

  • Materialized views in SQL

Optimize schema for read latency, even if writes get a bit heavier.


10.  Latency Observability and Monitoring

Track latency in real-time:

  • APM tools: Datadog, New Relic

  • Traces: Jaeger, Zipkin

  • Metrics: Prometheus + Grafana

  • Histograms: For percentile tracking

Monitor p95/p99 and correlate with spikes, cache misses, or backend failures.


 Summary

Concept

Why It Matters

Latency types

Understand full stack delay

Tail latency

Optimize for worst cases

Budgeting

Design systems within response limits

Geo latency

Serve users fast globally

Caching

Reduce repeat processing

Data modeling

Avoid expensive queries

Monitoring

Catch regressions & tail spikes



Post a Comment

0Comments

Post a Comment (0)