Frameworks

Frameworks Quiz

Quiz

Question 1 of 12 (0 answered)

Question 1

List the 6 stages of the Request Path typical flow in order:

✓

Correct!

Request Path — Typical Flow:

User — Request originates from the client.
DNS / GSLB — DNS resolves the hostname; GSLB routes traffic to the nearest healthy region.
CDN / LB — CDN serves cached content at the edge; load balancer distributes traffic across backend instances.
API gateway / ingress — TLS is typically terminated here; handles auth, rate limiting, and routing to services.
Service — Application logic processes the request.
Cache / DB / dependencies — Service reads from cache (hit) or falls through to the database or downstream dependencies (miss).

✗

Incorrect

Request Path — Typical Flow:

User — Request originates from the client.
DNS / GSLB — DNS resolves the hostname; GSLB routes traffic to the nearest healthy region.
CDN / LB — CDN serves cached content at the edge; load balancer distributes traffic across backend instances.
API gateway / ingress — TLS is typically terminated here; handles auth, rate limiting, and routing to services.
Service — Application logic processes the request.
Cache / DB / dependencies — Service reads from cache (hit) or falls through to the database or downstream dependencies (miss).

Think: request enters at the edge (DNS → CDN) → passes through the gateway → hits the service → resolves data from cache or DB.

Question 2

What deploy safety controls are covered in the Operational Model step of the SRE System Design Framework?

Control	Purpose
Canary rollout	1% → 10% → 100% traffic shift with error rate checks at each step
Automated rollback	Roll back automatically on error rate spike during canary
Feature flags	Decouple deploy from release; disable at runtime without redeploy
Config validation	Validate config schema at deploy time, not at runtime

Did you get it right?

✓

Correct!

✗

Incorrect

Question 3

Which of the following are common user-facing SLIs? (Select all that apply)

Availability — % of requests returning a successful response Latency — p95 or p99 response time CPU utilization — average CPU % across all pods Freshness — % of responses containing data within an acceptable age Error rate — % of requests resulting in an error Disk I/O — read/write throughput per node Durability — data loss rate (for stateful systems)

✓

Correct!

Common user-facing SLIs:

Availability — % of requests returning a successful response
Latency — p95 or p99 response time
Freshness — % of responses containing data within an acceptable age
Error rate — % of requests resulting in an error
Durability — data loss rate (stateful systems only)

CPU utilization and Disk I/O are infrastructure metrics — they are useful for debugging and capacity planning but should not be primary SLIs because they don’t directly reflect what users experience.

✗

Incorrect

Common user-facing SLIs:

Availability — % of requests returning a successful response
Latency — p95 or p99 response time
Freshness — % of responses containing data within an acceptable age
Error rate — % of requests resulting in an error
Durability — data loss rate (stateful systems only)

SLIs should reflect what users actually feel — not what is easy to measure at the infrastructure layer.

Question 4

List the common failure modes and their mitigations covered in the SRE System Design Framework.

Failure Mode	Mitigation
Regional outage	GSLB reroutes; serves from remaining regions
Dependency latency spike	Timeout triggers fallback; stale cache served
Cache miss storm	DB absorbs load; autoscaling kicks in
Retry storm	Circuit breaker opens; upstream protected
Network partition	Partition-tolerant path serves cached data

Did you get it right?

✓

Correct!

✗

Incorrect

Question 5

List the 9 steps of the Incident Response Framework in order:

✓

Correct!

Incident Response Framework:

Confirm — Verify the issue is real, current, and user-impacting.
Scope — Determine blast radius: who, what, where, since when.
Correlate — Identify recent changes that could explain the issue.
Stabilize — Pause risky changes and stop the incident from expanding.
Locate — Narrow the fault domain using golden signals and dependency tracing.
Mitigate — Take the fastest safe action to reduce user impact.
Root Cause — Identify both the trigger and the missing safeguard.
Recover — Confirm the system is truly healthy, not just quieter.
Prevent — Add fixes, guardrails, and learnings to avoid recurrence.

✗

Incorrect

Incident Response Framework:

Confirm — Verify the issue is real, current, and user-impacting.
Scope — Determine blast radius: who, what, where, since when.
Correlate — Identify recent changes that could explain the issue.
Stabilize — Pause risky changes and stop the incident from expanding.
Locate — Narrow the fault domain using golden signals and dependency tracing.
Mitigate — Take the fastest safe action to reduce user impact.
Root Cause — Identify both the trigger and the missing safeguard.
Recover — Confirm the system is truly healthy, not just quieter.
Prevent — Add fixes, guardrails, and learnings to avoid recurrence.

Think: Confirm the problem → understand its Scope → Correlate with changes → Stabilize → Locate the fault → Mitigate → find Root Cause → Recover → Prevent.

Question 6

What are the traffic and data scaling strategies covered in the Scaling Strategy step of the SRE System Design Framework?

Traffic scaling:

Strategy	Purpose
HPA / KEDA	Scale service replicas on RPS, latency, or queue depth
CA (ASG / Karpenter)	Add nodes as pod demand grows
CDN + cache	Absorb read traffic before it hits the service layer

Data scaling:

Strategy	Purpose
Read replicas	Spread read load across multiple DB instances
Sharding / partitioning	Horizontal data split for write-heavy workloads
Cache tiering	Local in-process cache → regional cache → DB

Did you get it right?

✓

Correct!

✗

Incorrect

Question 7

List the 4 Golden Signals in order (LETS):

✓

Correct!

4 Golden Signals (LETS):

Latency — How long requests take (split p50 / p95 / p99)
Errors — Rate of failed requests
Traffic — Request volume (RPS or events/sec)
Saturation — Resource pressure (CPU, memory, queue depth)

Break each signal down by dimensions — region, endpoint, dependency, customer segment — so that when an SLO fires, you can isolate where the problem is, not just that something is wrong.

✗

Incorrect

4 Golden Signals (LETS):

Latency — How long requests take (split p50 / p95 / p99)
Errors — Rate of failed requests
Traffic — Request volume (RPS or events/sec)
Saturation — Resource pressure (CPU, memory, queue depth)

Break each signal down by dimensions — region, endpoint, dependency, customer segment — so that when an SLO fires, you can isolate where the problem is, not just that something is wrong.

Acronym: L-E-T-S

Question 8

List the 10 steps of the SRE System Design Framework in order:

✓

Correct!

SRE System Design Framework:

User Experience — What matters most to users?
SLIs / SLOs — How do we measure success?
Request Path — How does traffic flow end to end?
Core Components — What does each layer do and why?
HA Design — How do we survive instance, AZ, and region failures?
Failure Modes — What breaks, what is the blast radius, how do we contain it?
Cascading Failures — Timeouts, retries, circuit breakers, backpressure.
Scaling — How does the system grow safely?
Observability — Metrics, logs, traces, alerting.
Operations — Deploy, rollback, runbooks, game days.

✗

Incorrect

SRE System Design Framework:

User Experience — What matters most to users?
SLIs / SLOs — How do we measure success?
Request Path — How does traffic flow end to end?
Core Components — What does each layer do and why?
HA Design — How do we survive instance, AZ, and region failures?
Failure Modes — What breaks, what is the blast radius, how do we contain it?
Cascading Failures — Timeouts, retries, circuit breakers, backpressure.
Scaling — How does the system grow safely?
Observability — Metrics, logs, traces, alerting.
Operations — Deploy, rollback, runbooks, game days.

Think: user-first (UX → SLIs/SLOs) → map the system (Request Path → Components → HA) → failure analysis (Failure Modes → Cascading) → growth (Scaling → Observability → Operations).

Question 9

List the failure layers to address in the HA Design step of the SRE System Design Framework.

Failure Layer	HA Controls
Instance / pod failure	Health checks, restarts, redundant replicas
Node failure	Multi-node scheduling, PodDisruptionBudgets
AZ failure	Multi-AZ replicas, cross-AZ load balancing
Region failure	Multi-region active-active or active-passive failover
Dependency failure	Degraded mode, stale serving, fallback paths

Did you get it right?

✓

Correct!

✗

Incorrect

Question 10

List the 5 failure modes covered in the Observability & Alerting Design Framework.

Failure	Signal
DB slowdown	Latency SLI degrades; trace shows slow DB span
Cache miss spike	Latency increases; hit ratio metric drops
Region down	Availability SLI drops; sliced by region dimension
Dependency timeout	Error rate rises; trace shows timeout on external call
Traffic surge	Saturation metric rises; queue depth or CPU climbs

Did you get it right?

✓

Correct!

✗

Incorrect

Question 11

List the 8 steps of the Observability & Alerting Design Framework in order:

✓

Correct!

Observability & Alerting Design Framework:

User Experience — Anchor everything to what users actually feel.
SLIs — Pick 2–4 measurable indicators of user experience.
SLOs — Set reliability targets and define error budgets.
Signals — Golden Signals + dimensions (region, endpoint, path).
Instrumentation — Metrics (aggregate), logs (debug), traces (latency).
Alerting — SLO-based burn-rate alerts — page only on user impact.
Failure Modes — Identify likely failure patterns and verify detectability.
Scaling & Cost — Control cardinality, sample traces, tier storage.

✗

Incorrect

Observability & Alerting Design Framework:

User Experience — Anchor everything to what users actually feel.
SLIs — Pick 2–4 measurable indicators of user experience.
SLOs — Set reliability targets and define error budgets.
Signals — Golden Signals + dimensions (region, endpoint, path).
Instrumentation — Metrics (aggregate), logs (debug), traces (latency).
Alerting — SLO-based burn-rate alerts — page only on user impact.
Failure Modes — Identify likely failure patterns and verify detectability.
Scaling & Cost — Control cardinality, sample traces, tier storage.

Think: user experience first → define SLIs → attach SLOs → collect Signals → instrument → Alert on burn-rate → validate Failure Modes → control Scaling & Cost.

Question 12

List the key reliability controls covered in the Cascading-Failure Controls step of the SRE System Design Framework.

Control	Purpose
Timeouts	Every outbound call has a deadline; fail fast, do not hang
Retries	Bounded retries with exponential backoff and jitter
Circuit breakers	Open after N failures; stop hammering a degraded dependency
Rate limiting	Protect the service and its dependencies from overload
Backpressure	Signal upstream when the service is at capacity
Stale serving	Return cached or degraded responses rather than errors

Did you get it right?

✓

Correct!

✗

Incorrect

Quiz Results

Score

0/0

Accuracy

Right

Wrong

Skipped

Last updated on May 2, 2026

Interview Prep