Istio
What is Istio?
Istio is an open-source service mesh platform that provides a uniform way to secure, connect, and observe microservices.
Why Use Istio?
In a microservices architecture, applications are decomposed into many small services that communicate over the network. This introduces challenges:
- Service-to-service communication: Managing secure, reliable communication between hundreds of services
- Observability: Understanding traffic flow, latency, and failures across services
- Security: Enforcing authentication, authorization, and encryption between services
- Traffic management: Implementing advanced routing, load balancing, and resilience patterns
- Policy enforcement: Applying consistent policies across all services
Istio solves these challenges by providing a transparent infrastructure layer that sits between your services and the network, handling cross-cutting concerns without requiring changes to application code.
Key Benefits
- Traffic control: Fine-grained control over traffic routing and behavior
- Security: Automatic mTLS, authentication, and authorization
- Observability: Metrics, logs, and distributed tracing out-of-the-box
- Resilience: Circuit breakers, retries, timeouts, and fault injection
- Policy enforcement: Centralized policy management and rate limiting
Service Mesh Fundamentals
What is a Service Mesh?
A service mesh is a dedicated infrastructure layer for managing service-to-service communication.
┌───────────────────────────────────────────────────────────────┐
│ Service Mesh │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Service A│ │Service B│ │Service C│ │Service D│ │
│ │ ┌───┐ │ │ ┌───┐ │ │ ┌───┐ │ │ ┌───┐ │ │
│ │ │App│ │ │ │App│ │ │ │App│ │ │ │App│ │ │
│ │ └───┘ │ │ └───┘ │ │ └───┘ │ │ └───┘ │ │
│ │ ┌───┐ │ │ ┌───┐ │ │ ┌───┐ │ │ ┌───┐ │ │
│ │ │ P │<─┼────┼─>│ P │<─┼────┼─>│ P │<─┼────┼─>│ P │ │ │
│ │ └───┘ │ │ └───┘ │ │ └───┘ │ │ └───┘ │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ ↑ ↑ ↑ ↑ │
│ └──────────────┴──────────────┴──────────────┘ │
│ Proxy Network (Data Plane) │
│ ↕ │
│ ┌──────────────────┐ │
│ │ Control Plane │ │
│ │ (istiod) │ │
│ └──────────────────┘ │
└───────────────────────────────────────────────────────────────┘
P = Proxy (Envoy)The service mesh consists of:
- Data Plane: Network of proxies that handle all inter-service communication
- Control Plane: Manages and configures the proxies
Istio Architecture
Istio’s architecture is divided into two main components:
High-Level Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Istio Mesh │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Control Plane │ │
│ │ │ │
│ │ ┌───────────────────────┐ │ │
│ │ │ istiod │ │ │
│ │ │ ┌─────────────────┐ │ │ │
│ │ │ │ Pilot │ │ Configuration │ │
│ │ │ │ - Service Disc. │ │ & Service │ │
│ │ │ │ - Traffic Mgmt │ │ Discovery │ │
│ │ │ └─────────────────┘ │ │ │
│ │ │ ┌─────────────────┐ │ │ │
│ │ │ │ Citadel (CA) │ │ Certificate │ │
│ │ │ │ - mTLS/PKI │ │ Management │ │
│ │ │ └─────────────────┘ │ │ │
│ │ │ ┌─────────────────┐ │ │ │
│ │ │ │ Galley │ │ Config │ │
│ │ │ │ - Validation │ │ Validation │ │
│ │ │ └─────────────────┘ │ │ │
│ │ └───────────────────────┘ │ │
│ │ ↓ │ │
│ └─────────────────────────┼─────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Data Plane │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Pod A │ │ Pod B │ │ Pod C │ │ │
│ │ │ ┌──────┐ │ │ ┌──────┐ │ │ ┌──────┐ │ │ │
│ │ │ │ App │ │ │ │ App │ │ │ │ App │ │ │ │
│ │ │ └──────┘ │ │ └──────┘ │ │ └──────┘ │ │ │
│ │ │ ┌──────┐ │ │ ┌──────┐ │ │ ┌──────┐ │ │ │
│ │ │ │Envoy │<┼──────┼>│Envoy │<┼──────┼>│Envoy │ │ │ │
│ │ │ │Proxy │ │ │ │Proxy │ │ │ │Proxy │ │ │ │
│ │ │ └──────┘ │ │ └──────┘ │ │ └──────┘ │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘Control Plane (istiod)
In modern Istio (1.5+), the control plane is consolidated into a single binary called istiod, which includes:
Pilot
- Service Discovery: Maintains a registry of all services and their endpoints
- Traffic Management: Converts high-level routing rules into Envoy configurations
- Configuration Distribution: Pushes configurations to all Envoy proxies
- Supports: A/B testing, canary deployments, traffic splitting, circuit breakers, retries, timeouts
Citadel (Certificate Authority)
- Certificate Management: Issues and rotates X.509 certificates for workloads
- Identity: Provides strong identity to each service
- mTLS: Enables automatic mutual TLS encryption between services
- SPIFFE: Implements SPIFFE standard for service identity
Galley
- Configuration Validation: Validates user-authored Istio configuration
- Configuration Ingestion: Processes and distributes configuration to istiod
- Abstraction: Isolates istiod from underlying platform (Kubernetes, VMs)
Mixer (Deprecated)
- Note: Mixer has been deprecated and removed in Istio 1.7+
- Previously handled:
- Access control and policy checks
- Telemetry data collection
- These functions are now handled by Envoy proxies directly (via WASM extensions)
Data Plane
The data plane consists of Envoy proxies deployed alongside each service:
Envoy Proxy: High-performance C++ proxy originally built by Lyft
Sidecar Pattern: In sidecar mode, each pod gets an Envoy container
Traffic Interception: All inbound/outbound traffic goes through the proxy
Capabilities:
- Dynamic service discovery
- Load balancing
- TLS termination
- HTTP/2 and gRPC proxying
- Circuit breakers
- Health checks
- Staged rollouts with percentage-based traffic splits
- Fault injection
- Rich metrics
Deployment Modes
Istio supports multiple deployment modes to fit different use cases and requirements.
1. Sidecar Mode (Traditional)
In sidecar mode, Istio deploys an Envoy proxy container alongside each application pod:
┌────────────────────────────────────────┐
│ Kubernetes Pod │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ │ │ │ │
│ │ Application │◄───┤ Envoy Proxy │ │
│ │ Container │ │ (istio-proxy) │
│ │ │ │ │ │
│ └──────────────┘ └───────┬──────┘ │
│ │ │
└──────────────────────────────┼─────────┘
│
All traffic flows
through proxyCharacteristics:
- Each pod gets its own Envoy proxy sidecar container
- Proxy intercepts all inbound and outbound traffic using iptables rules
- Full Layer 7 (HTTP/gRPC) capabilities per pod
- Pros: Complete feature set, mature, well-tested
- Cons: Higher resource overhead (one proxy per pod)
Use cases:
- Production environments requiring full L7 features
- Applications needing advanced traffic management
- When resource overhead is acceptable
2. Ambient Mode (Sidecarless)
Ambient mode is a newer deployment model that reduces resource overhead by eliminating per-pod sidecars:
┌────────────────────────────────────────────────────────────────┐
│ Kubernetes Node │
│ │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │ Pod │ │ Pod │ │ Pod │ │ Pod │ (No sidecars!) │
│ │ │ │ │ │ │ │ │ │
│ └───┬──┘ └───┬──┘ └───┬──┘ └───┬──┘ │
│ │ │ │ │ │
│ └─────────┴─────────┴─────────┘ │
│ │ │
│ ┌───────▼────────┐ │
│ │ ztunnel │ Layer 4 (per-node) │
│ │ (L4 Proxy) │ - mTLS │
│ └───────┬────────┘ - Basic routing │
│ │ │
└──────────────────────┼─────────────────────────────────────────┘
│
▼
┌───────────────────────┐
│ Waypoint Proxy │ Layer 7 (per-namespace)
│ (Optional) │ - Advanced routing
│ - Full L7 features │ - Traffic policies
└───────────────────────┘Ambient mode has two components:
ztunnel (Zero Trust Tunnel)
Runs as a DaemonSet (one per node)
Handles Layer 4 traffic (TCP)
Provides:
Mutual TLS (mTLS) encryption
Basic authentication and authorization
Telemetry at L4
Lightweight and efficient
Waypoint Proxies
Optional per-namespace Envoy proxies
Provide Layer 7 (HTTP/gRPC) features
Deploy only when you need advanced capabilities:
Complex routing rules
Request-level policies
HTTP header manipulation
Fault injection
Advanced observability
Characteristics:
- No sidecar containers in application pods
- Significantly reduced resource consumption
- Gradual adoption of L7 features (opt-in per namespace)
- Pros: Lower resource overhead, simpler upgrades
- Cons: Newer (less mature), limited L7 features without waypoint proxies
Use cases:
- Large-scale deployments where resource efficiency is critical
- Environments with many simple services
- Gradual migration from no mesh to full mesh
Comparison: Sidecar vs Ambient
| Aspect | Sidecar Mode | Ambient Mode |
|---|---|---|
| Resource overhead | High (proxy per pod) | Low (shared proxies) |
| L7 features | Always available | Opt-in via waypoint |
| Maturity | Stable, production-ready | Newer (Istio 1.15+) |
| Upgrade complexity | Rolling pod restarts | Simpler (node-level) |
| Best for | Feature-rich environments | Large-scale, cost-sensitive |
Core Components
Istio uses several Custom Resource Definitions (CRDs) to configure service mesh behavior. Understanding these resources is essential for effective traffic management.
Configuration Resources Overview
External Traffic
↓
┌──────────────┐
│ Gateway │ ← Defines ports/hosts for mesh entry
└──────┬───────┘
↓
┌──────────────┐
│VirtualService│ ← Routing rules (where to send traffic)
└──────┬───────┘
↓
┌──────────────┐
│DestinationRule ← Policies (how to handle traffic)
└──────┬───────┘
↓
┌──────────────┐
│ Service │ ← Kubernetes Service
└──────┬───────┘
↓
┌──────────────┐
│ Pod │ ← Application workload
└──────────────┘1. VirtualService (Traffic Routing)
VirtualService defines routing rules that specify how requests are routed to services within the mesh.
Key capabilities:
- Route traffic based on HTTP headers, URI paths, source labels
- Split traffic across multiple service versions (for canary deployments)
- Add timeouts, retries, and fault injection
- Redirect and rewrite URLs
Example use cases:
- Route 90% of traffic to v1 and 10% to v2 (canary testing)
- Route requests with header
user: premiumto a special backend - Add automatic retries on connection failures
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews-route
spec:
hosts:
- reviews.default.svc.cluster.local
http:
- match:
- headers:
user-type:
exact: premium
route:
- destination:
host: reviews
subset: v2
- route:
- destination:
host: reviews
subset: v1
weight: 90
- destination:
host: reviews
subset: v2
weight: 10
retries:
attempts: 3
retryOn: "5xx,reset,connect-failure"
timeout: 5s2. DestinationRule (Traffic Policies)
DestinationRule defines policies that apply to traffic after routing has occurred. These are policies for the “real” destination.
Key capabilities:
- Define service subsets (versions) based on labels
- Configure load balancing algorithms
- Set up connection pool settings
- Enable/configure mutual TLS
- Configure circuit breakers and outlier detection
Example use cases:
- Define subsets for different versions (v1, v2, v3)
- Use least-connection load balancing
- Enable circuit breaker to prevent cascading failures
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews-destination
spec:
host: reviews
trafficPolicy:
loadBalancer:
simple: LEAST_CONN
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
http2MaxRequests: 100
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
trafficPolicy:
loadBalancer:
simple: ROUND_ROBINLoad Balancing Algorithms
Istio supports multiple load balancing strategies:
| Algorithm | Description | Use Case |
|---|---|---|
ROUND_ROBIN | Distributes requests evenly in rotation | Default, works well for homogeneous backends |
LEAST_CONN | Sends to backend with fewest active connections | Backends with varying load capacity |
LEAST_REQUEST | Sends to backend with fewest active requests | HTTP/2 and gRPC workloads |
RANDOM | Randomly selects a backen | Simple, low-overhead distribution |
PASSTHROUGH | Forwards without load balancin | Direct connection scenarios |
CONSISTENT_HASH | Hash-based distribution (sticky sessions) | Session affinity requirements |
Connection Pool Settings
LoadBalancerSettings options:
- simple: Standard algorithms (ROUND_ROBIN, LEAST_CONN, etc.)
- consistentHash: Hash-based routing for session affinity
- localityLbSetting: Locality-aware load balancing (prefer local endpoints)
- warmupDurationSecs: Gradually increase traffic to new endpoints instead of sending full load immediately
3. Gateway (Mesh Entry/Exit Points)
Gateway configures a load balancer operating at the edge of the mesh for receiving incoming or outgoing HTTP/TCP connections.
Key capabilities:
- Define external entry points (ingress) or exit points (egress)
- Configure ports, protocols, and TLS settings
- Attach to specific gateway deployments using selectors
- Support for mutual TLS (mTLS) authentication
Example use cases:
- Expose services to external clients via HTTPS
- Configure mTLS for client certificate authentication
- Set up egress gateway for controlled external API access
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: my-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway # Selects the ingress gateway pods
servers:
- port:
number: 443
name: https
protocol: HTTPS
hosts:
- "myapp.example.com"
tls:
mode: SIMPLE
credentialName: myapp-tls-cert4. ServiceEntry (External Services)
ServiceEntry enables adding external services (outside the mesh) into Istio’s internal service registry.
Key capabilities:
- Add external APIs or databases to the mesh
- Apply mesh policies to external services
- Control and monitor traffic to external endpoints
Example use cases:
- Integrate external payment APIs with mesh policies
- Apply retries and timeouts to external database connections
- Monitor traffic to third-party services
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: external-payment-api
spec:
hosts:
- api.payment-provider.com
ports:
- number: 443
name: https
protocol: HTTPS
location: MESH_EXTERNAL
resolution: DNS5. Sidecar (Proxy Configuration)
Sidecar resource fine-tunes the configuration of sidecar proxies attached to workloads.
Key capabilities:
- Limit the set of services a sidecar can reach
- Optimize resource usage by reducing configuration size
- Control inbound and outbound traffic behavior
Example use cases:
- Reduce memory footprint in large meshes
- Restrict which services a workload can communicate with
- Improve proxy startup time
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
name: default
namespace: my-app
spec:
egress:
- hosts:
- "./*" # Only allow traffic within same namespace
- "istio-system/*"Traffic Management
Traffic management is one of Istio’s core features, enabling sophisticated control over service-to-service communication.
Request Routing
Control where traffic goes based on various criteria:
Path-based routing:
http:
- match:
- uri:
prefix: /api/v1
route:
- destination:
host: service-v1
- match:
- uri:
prefix: /api/v2
route:
- destination:
host: service-v2Header-based routing:
http:
- match:
- headers:
x-api-version:
exact: "2.0"
route:
- destination:
host: service-v2Traffic Splitting (Canary Deployments)
Gradually shift traffic from old version to new version:
http:
- route:
- destination:
host: reviews
subset: v1
weight: 80
- destination:
host: reviews
subset: v2
weight: 20Deployment strategy:
- Deploy v2 alongside v1
- Route 10% → v2, 90% → v1
- Monitor metrics and errors
- Gradually increase v2 traffic: 25%, 50%, 75%, 100%
- Decommission v1
Timeouts and Retries
Timeouts prevent requests from hanging indefinitely:
http:
- route:
- destination:
host: my-service
timeout: 5sRetries automatically retry failed requests:
http:
- route:
- destination:
host: my-service
retries:
attempts: 3
perTryTimeout: 2s
retryOn: "5xx,reset,connect-failure,refused-stream"Circuit Breaking
Prevent cascading failures by limiting connections to unhealthy services:
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
maxRequestsPerConnection: 5
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50How it works:
- Service starts experiencing errors
- After 5 consecutive errors, Istio ejects the endpoint for 30s
- Gradual recovery: endpoint gets limited traffic to test health
- If healthy, fully restored; if not, ejected again
Fault Injection (Chaos Engineering)
Test application resilience by injecting faults:
Delay injection (simulate slow networks):
http:
- fault:
delay:
percentage:
value: 10
fixedDelay: 5s
route:
- destination:
host: my-serviceAbort injection (simulate service failures):
http:
- fault:
abort:
percentage:
value: 20
httpStatus: 503
route:
- destination:
host: my-serviceTraffic Mirroring (Shadowing)
Send copy of live traffic to a test service without affecting production:
http:
- route:
- destination:
host: service-v1
mirror:
host: service-v2
mirrorPercentage:
value: 50Use cases:
- Test new version with real traffic without risk
- Compare performance between versions
- Validate refactored services
Security
Istio provides multiple layers of security for microservices.
Mutual TLS (mTLS)
Automatic mTLS encrypts all service-to-service communication and provides strong identity.
┌──────────┐ ┌──────────┐
│ Service A│ │ Service B│
│ ┌────┐ │ │ ┌────┐ │
│ │App │ │ │ │App │ │
│ └──┬─┘ │ │ └─┬──┘ │
│ │ │ │ │ │
│ ┌──▼──┐ │ 1. Establish mTLS connection │ ┌─▼───┐ │
│ │Envoy│─┼────────────────────────────────────┼─►│Envoy│ │
│ │ │ │ 2. Verify certificates (both ways)│ │ │ │
│ │ │◄┼────────────────────────────────────┼──│ │ │
│ │ │ │ 3. Encrypted communication │ │ │ │
│ └─────┘ │◄────────────────────────────────► │ └─────┘ │
└──────────┘ └──────────┘
│ │
└────────────── Citadel (CA) ─────────────────────┘
(Issues & rotates certificates)Configuration modes:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: my-namespace
spec:
mtls:
mode: STRICT # Options: STRICT, PERMISSIVE, DISABLE- STRICT: Only accept mTLS connections
- PERMISSIVE: Accept both mTLS and plaintext (for migration)
- DISABLE: Disable mTLS
Authorization Policies
Control who can access what services:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: frontend-policy
namespace: default
spec:
selector:
matchLabels:
app: frontend
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/api-gateway"]
to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/*"]Common patterns:
- Allow only specific services to call an API
- Restrict HTTP methods (e.g., only GET and POST)
- Deny access to admin endpoints except from specific namespaces
Request Authentication (JWT)
Validate JWT tokens from external identity providers:
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
name: jwt-auth
spec:
selector:
matchLabels:
app: api-service
jwtRules:
- issuer: "https://auth.example.com"
jwksUri: "https://auth.example.com/.well-known/jwks.json"Use cases:
- Validate OAuth2/OIDC tokens
- Enforce authentication for external API calls
- Extract user identity from JWT claims
Observability
Istio automatically generates telemetry for all traffic in the mesh without requiring application changes.
Three Pillars of Observability
┌───────────────────────────────────────────────────────────┐
│ Observability Stack │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Metrics │ │ Logs │ │ Traces │ │
│ │ │ │ │ │ │ │
│ │ Prometheus │ │ Fluentd │ │ Jaeger │ │
│ │ Grafana │ │ ELK │ │ Zipkin │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ┌───────────▼───────────┐ │
│ │ Envoy Proxies │ │
│ │ (Generate telemetry) │ │
│ └───────────────────────┘ │
└───────────────────────────────────────────────────────────┘1. Metrics
Istio automatically collects:
- Request rate: Requests per second
- Request latency: P50, P90, P95, P99 percentiles
- Error rate: 4xx and 5xx responses
- Request size: Bytes sent/received
Key metrics:
istio_requests_total: Total request countistio_request_duration_milliseconds: Request latencyistio_request_bytes: Request sizeistio_response_bytes: Response size
Golden Signals (RED method):
- Rate: Requests per second
- Errors: Percentage of failed requests
- Duration: Request latency distribution
2. Distributed Tracing
Track requests as they flow through multiple services:
User Request → API Gateway → Auth Service → Product Service → DB
20ms 15ms 100ms 50ms
│──────────────────────────────────────────────────────────│
Total Latency: 185msTrace components:
- Trace: End-to-end request journey
- Span: Single operation (e.g., one service call)
- Tags: Metadata (HTTP method, status code, etc.)
Integration: Jaeger, Zipkin
- Visualize request flow
- Identify bottlenecks
- Debug latency issues
3. Access Logs
Envoy generates detailed access logs:
{
"start_time": "2024-01-15T10:30:00.000Z",
"method": "GET",
"path": "/api/products",
"response_code": 200,
"duration": 45,
"upstream_service": "products.default.svc.cluster.local",
"user_agent": "Mozilla/5.0...",
"request_id": "abc-123-def-456"
}Service Discovery and Endpoints
Istio’s Pilot component provides automatic service discovery:
How it works:
- Kubernetes creates Endpoints for each Service
- Pilot watches Kubernetes API for changes
- Pilot pushes updated endpoint information to all Envoy proxies
- Proxies use this info for load balancing
Real-world example:
# Service definition
$ kubectl -n myapp get svc myapp-test
NAME TYPE CLUSTER-IP PORT(S)
myapp-test ClusterIP 10.100.229.132 8443/TCP
# Endpoints (actual pod IPs)
$ kubectl -n myapp get endpoints myapp-test
NAME ENDPOINTS
myapp-test 240.48.67.221:8080,240.48.69.154:8080
# Pods backing the service
$ kubectl -n myapp get pods -o wide
NAME READY IP
myapp-test-78ddbd8c64-9bkzb 3/3 240.48.69.154
myapp-test-78ddbd8c64-tsnfw 3/3 240.48.67.221Dynamic updates:
- Pod scales up → New endpoint added → Pilot updates all proxies
- Pod becomes unhealthy → Endpoint removed → Traffic stops routing to it
- Zero-downtime deployments
Advanced Concepts
Graceful Termination and Connection Draining
When pods are terminated, ensure graceful shutdown:
Termination Flow:
App Container → istio-proxy → ingress-gateway → Load BalancerKey settings:
- terminationGracePeriodSeconds (Pod level): Time Kubernetes waits before killing pod
- drainDuration (Istio): Time Envoy waits before closing connections
apiVersion: networking.istio.io/v1beta1
kind: EnvoyFilter
metadata:
name: drain-duration
spec:
configPatches:
- applyTo: CLUSTER
patch:
operation: MERGE
value:
drain_connections_on_host_removal: trueBest practices:
- Set
terminationGracePeriodSeconds: 30(or higher) - Configure
drainDurationto allow connections to complete - Use preStop hooks to delay SIGTERM
- Implement health check endpoints
Multi-Cluster Mesh
Connect services across multiple Kubernetes clusters:
┌──────────────┐ ┌──────────────┐
│ Cluster A │ │ Cluster B │
│ │ │ │
│ ┌──────────┐ │ │ ┌──────────┐ │
│ │ Service A│ │◄───────►│ │ Service B│ │
│ └──────────┘ │ │ └──────────┘ │
│ │ │ │
│ istiod-A │ │ istiod-B │
└──────────────┘ └──────────────┘
│ │
└────────────────────────┘
Shared control plane
(or federated)Deployment models:
- Single control plane: One istiod manages multiple clusters
- Multi-primary: Each cluster has its own control plane
- Primary-remote: One primary, others are remote
Locality-Aware Load Balancing
Route traffic to nearby services first:
trafficPolicy:
loadBalancer:
localityLbSetting:
enabled: true
distribute:
- from: "us-west/us-west-1/*"
to:
"us-west/us-west-1/*": 80
"us-west/us-west-2/*": 20
failover:
- from: "us-west/us-west-1/*"
to: "us-east/us-east-1/*"Benefits:
- Reduced latency (same region/zone)
- Lower data transfer costs
- High availability (automatic failover)
WebAssembly (WASM) Extensions
Extend Envoy proxy with custom logic:
apiVersion: extensions.istio.io/v1alpha1
kind: WasmPlugin
metadata:
name: custom-auth
spec:
selector:
matchLabels:
app: api-service
url: oci://my-registry/custom-auth-plugin:v1.0
phase: AUTHNUse cases:
- Custom authentication/authorization
- Request/response transformation
- Rate limiting
- Custom telemetry
Real-World Examples
This section walks through practical examples using actual Kubernetes resources, demonstrating how traffic flows through an Istio service mesh.
Complete Traffic Flow
Understanding the full request path through Istio:
External Client
↓
NLB (Network Load Balancer)
↓
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Istio Ingress Gateway (app-ingress-gateway) │ │
│ │ (Envoy proxy deployment) │ │
│ │ • Listens on configured ports (e.g., 8443) │ │
│ │ • Receives configuration from istiod │ │
│ └────────────────────┬───────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Gateway Resource │ │
│ │ • Defines ports, protocols, TLS settings │ │
│ │ • Selects ingress gateway pods via label selector │ │
│ └────────────────────┬───────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ VirtualService │ │
│ │ • Matches incoming requests (host, path, headers) │ │
│ │ • Defines routing rules and destinations │ │
│ │ • Configures retries, timeouts │ │
│ └────────────────────┬───────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ DestinationRule (Optional) │ │
│ │ • Defines subsets (versions) │ │
│ │ • Load balancing policies │ │
│ │ • Connection pool settings │ │
│ └────────────────────┬───────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Kubernetes Service │ │
│ │ • ClusterIP with stable DNS name │ │
│ │ • Selects pods via label selectors │ │
│ │ • Maps service port to container targetPort │ │
│ └────────────────────┬───────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Endpoints │ │
│ │ • Dynamic list of pod IPs and ports │ │
│ │ • Automatically updated as pods scale/fail │ │
│ └────────────────────┬───────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Application Pod │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ istio-proxy │ │ Application │ │ │
│ │ │ (Envoy) │────►│ Container │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘Key points:
- Istio Ingress Gateway is a pod running Envoy proxy (not a hardcoded config)
- Gateway, VirtualService, DestinationRule are configuration objects that tell the proxies how to route
- istiod (control plane) pushes all configurations to Envoy proxies at runtime
- Configurations are dynamic and can be updated without restarting pods
Example 1: Istio Control Plane Components
Viewing the Istio system components:
$ kubectl -n istio-system get pods -o wide
NAME READY STATUS IP NODE
app-ingress-gateway-68945bdbd7-5dxxr 1/1 Running 240.48.71.138 ip-240-48-71-119
app-ingress-gateway-68945bdbd7-jkbzv 1/1 Running 240.48.68.10 ip-240-48-69-0
app-ingress-gateway-68945bdbd7-lj5fl 1/1 Running 240.48.67.64 ip-240-48-67-148
istiod-fd589774b-2cl2l 1/1 Running 240.48.71.251 ip-240-48-71-119
istiod-fd589774b-dd2xw 1/1 Running 240.48.69.34 ip-240-48-69-0
istiod-fd589774b-k5dgc 1/1 Running 240.48.67.115 ip-240-48-67-148Observations:
- app-ingress-gateway: Multiple replicas (3) for high availability
- Each pod is an Envoy proxy acting as the entry point
- Distributed across different nodes for fault tolerance
- Target IPs are registered with external load balancer
- istiod: Control plane component (3 replicas for HA)
- Manages configuration for all proxies
- Provides service discovery and certificate management
Example 2: Gateway Configuration
Defining an ingress gateway with mutual TLS:
$ kubectl -n istio-system get gateway myapp-mgt-qa1-usw2 -o yamlapiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: myapp-mgt-qa1-usw2
namespace: istio-system
spec:
selector:
istio: app-ingress-gateway # Selects ingress gateway pods with this label
servers:
- hosts:
- myapp-qa1-usw2.mgt-nonprod.myorg.com # Hostname to handle
port:
name: https-mutual
number: 8443
protocol: HTTPS
tls:
mode: MUTUAL # Requires client certificate authentication
minProtocolVersion: TLSV1_2
maxProtocolVersion: TLSV1_3
serverCertificate: /etc/istio/tls/tls.crt
privateKey: /etc/istio/tls/tls.key
caCertificates: /etc/myorg/ca/myorg_corp_auth_ca1.pemKey observations:
- selector: Uses label selector to identify which ingress gateway pods handle this config
- hosts: Defines the hostname this gateway will accept traffic for
- port: Listens on port 8443 for HTTPS traffic
- tls.mode = MUTUAL: Requires both server and client certificates (strong authentication)
- Certificates: Mounted from Kubernetes secrets/config maps into the gateway pods
- Wildcard support: If hosts =
["*"], gateway accepts any hostname
Example 3: VirtualService Routing
VirtualService defines routing rules to forward requests to backend services:
$ kubectl -n myapp get vs myapp-test-vs -o yamlapiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp-test-vs
namespace: myapp
spec:
gateways:
- istio-system/myapp-mgt-qa1-usw2 # References the Gateway (cross-namespace)
hosts:
- myapp-test.myapp.svc.cluster.local # Internal DNS name
- myapp-qa1-usw2.mgt-nonprod.myorg.com # External hostname (matches Gateway)
http:
- retries:
attempts: 1
retryOn: connect-failure,refused-stream
route:
- destination:
host: myapp-test # Routes to Service named "myapp-test"Key observations:
- gateways: Links to the Gateway resource (can be in different namespace)
- hosts: Accepts requests for both internal and external hostnames
myapp-test.myapp.svc.cluster.local: Internal mesh trafficmyapp-qa1-usw2.mgt-nonprod.myorg.com: External traffic (must match Gateway host)- retries: Automatically retry failed requests (resilience)
- destination.host: Routes to Kubernetes Service name (not pod directly)
How they work together:
- External request arrives at Gateway with hostname
myapp-qa1-usw2... - Gateway accepts it (hostname matches its configuration)
- VirtualService matches the hostname and applies routing rules
- Traffic is routed to the
myapp-testService
Example 4: Kubernetes Service and Endpoints
The Service provides stable networking and service discovery:
$ kubectl -n myapp get svc myapp-test
NAME TYPE CLUSTER-IP PORT(S)
myapp-test ClusterIP 10.100.229.132 8443/TCP$ kubectl -n myapp get svc myapp-test -o yaml
apiVersion: v1
kind: Service
metadata:
name: myapp-test
namespace: myapp
spec:
type: ClusterIP
clusterIP: 10.100.229.132
ports:
- name: http
port: 8443 # Port exposed by the Service (client-facing)
protocol: TCP
targetPort: 8080 # Port on the Pod container
selector:
app.kubernetes.io/instance: myapp-myapp # Selects pods with these labels
app.kubernetes.io/name: myapp-testKey observations:
- ClusterIP: Virtual IP accessible only within the cluster
- port vs targetPort:
port: 8443: Clients connect to Service on this porttargetPort: 8080: Service forwards to pod containers on this port- Allows decoupling external API from internal implementation
- selector: Labels that identify which pods receive traffic
Endpoints (automatically managed by Kubernetes):
$ kubectl -n myapp get endpoints myapp-test
NAME ENDPOINTS
myapp-test 240.48.67.221:8080,240.48.69.154:8080Key observations:
- Endpoints list shows actual pod IPs and ports
- Dynamically updated as pods scale, restart, or fail
- Istio’s Pilot watches these endpoints and configures Envoy proxies
Example 5: Application Pods
The actual workload running the application:
$ kubectl -n myapp get pods -o wide
NAME READY STATUS IP NODE
myapp-test-78ddbd8c64-9bkzb 3/3 Running 240.48.69.154 ip-240-48-69-0
myapp-test-78ddbd8c64-tsnfw 3/3 Running 240.48.67.221 ip-240-48-67-148Notice: READY 3/3 indicates 3 containers per pod:
- Application container: Your app code
- istio-proxy: Envoy sidecar injected by Istio
- istio-init: Init container that sets up iptables rules (completed, not counted in READY)
Inspecting a pod:
$ kubectl -n myapp get pod myapp-test-78ddbd8c64-9bkzb -o yamlKey Istio-specific annotations in pod metadata:
annotations:
# Istio sidecar injection
sidecar.istio.io/inject: "true" # Enables automatic sidecar injection
istio.io/rev: default # Istio control plane revision
# Proxy configuration
proxy.istio.io/config: |
holdApplicationUntilProxyStarts: true # App waits for proxy to be ready
# Security and identity
security.istio.io/tlsMode: istio # Uses Istio mTLS
service.istio.io/canonical-name: myapp-test
service.istio.io/canonical-revision: latest
# Prometheus metrics
prometheus.io/scrape: "true"
prometheus.io/port: "15020" # Envoy metrics port
prometheus.io/path: /stats/prometheus
labels:
app.kubernetes.io/instance: myapp-myapp # Matches Service selector
app.kubernetes.io/name: myapp-test # Matches Service selector
security.istio.io/tlsMode: istioHow pod IPs match endpoints:
# Pod IPs
240.48.69.154
240.48.67.221
# Endpoints
240.48.67.221:8080,240.48.69.154:8080
# Service routes to these endpoints
# Envoy proxies receive these endpoint IPs from Pilot
# Load balancing happens across these pod IPsExample 6: Complete Request Flow
Scenario: External client makes HTTPS request to https://myapp-qa1-usw2.mgt-nonprod.myorg.com/api/data
Step-by-step flow:
External Load Balancer (NLB):
- Client DNS resolves to NLB IP
- NLB forwards to one of the ingress gateway pod IPs (e.g.,
240.48.71.138:8443)
Istio Ingress Gateway Pod:
- Envoy proxy receives the request
- Checks Gateway resource: hostname matches
myapp-qa1-usw2... - Performs mTLS termination using configured certificates
- Validates client certificate (mutual TLS)
VirtualService Matching:
- Envoy checks VirtualService resources
- Finds match: hostname
myapp-qa1-usw2...→ routes tomyapp-test - Applies retry policy: retry on
connect-failure,refused-stream
Service Resolution:
- Resolves
myapp-testservice to ClusterIP10.100.229.132 - Pilot has pushed endpoint list to Envoy:
[240.48.67.221:8080, 240.48.69.154:8080]
- Resolves
Load Balancing:
- No DestinationRule → uses default ROUND_ROBIN
- Selects one pod IP (e.g.,
240.48.69.154:8080)
Pod Sidecar (istio-proxy):
- Request arrives at pod’s Envoy sidecar (
240.48.69.154:15006) - Sidecar applies mTLS (encrypts with destination cert)
- Forwards to application container on port
8080
- Request arrives at pod’s Envoy sidecar (
Application Container:
- Receives request on
localhost:8080 - Processes request and returns response
- Receives request on
Response Path (reverse of request path):
- App → Sidecar → Ingress Gateway → NLB → Client
Key insights:
- Every hop involves an Envoy proxy (except NLB)
- Configuration is dynamic (no restarts needed for changes)
- mTLS is automatic and transparent to the application
- Observability data collected at each proxy
Summary and Best Practices
When to Use Istio
Good fit:
- Large-scale microservices (50+ services)
- Need for advanced traffic management (canary, A/B testing)
- Security requirements (zero-trust, mTLS)
- Polyglot environments (multiple languages/frameworks)
- Complex observability needs
Not a good fit:
- Simple applications (few services)
- Performance-critical with tight latency budgets
- Small teams without operational expertise
- Limited infrastructure resources
Best Practices
Start incrementally:
- Begin with sidecar injection for observability
- Gradually add traffic management features
- Consider ambient mode for resource efficiency
Security:
- Enable STRICT mTLS in production
- Use AuthorizationPolicies for fine-grained access control
- Regularly rotate certificates (automated by Citadel)
Traffic management:
- Always define retries and timeouts
- Use circuit breakers to prevent cascading failures
- Test canary deployments with small traffic percentages first
Performance:
- Use Sidecar resources to limit proxy configuration size
- Monitor resource usage (Envoy memory/CPU)
- Consider ambient mode for large-scale deployments
Observability:
- Integrate with Prometheus and Grafana for metrics
- Set up distributed tracing (Jaeger/Zipkin)
- Configure appropriate access log formats
Operations:
- Version control all Istio configurations
- Test configuration changes in non-production first
- Implement graceful termination (drainDuration, terminationGracePeriodSeconds)
- Use revision-based upgrades for control plane
Common Troubleshooting
Service not reachable:
- Check sidecar injection:
kubectl get pod <name> -o jsonpath='{.spec.containers[*].name}' - Verify VirtualService hosts match Gateway hosts
- Ensure Service selector matches pod labels
mTLS errors:
- Check PeerAuthentication mode (STRICT vs PERMISSIVE)
- Verify certificate expiration
- Ensure both sides have Istio proxies
High latency:
- Check for unnecessary retries
- Review timeout configurations
- Monitor Envoy resource usage
- Consider connection pool tuning
Configuration not applying:
- Validate with
istioctl analyze - Check istiod logs for errors
- Verify proxy can reach istiod (network policies)
Additional Resources
- Official Documentation: https://istio.io/latest/docs/
- Istio GitHub: https://github.com/istio/istio