Istio

What is Istio?

Istio is an open-source service mesh platform that provides a uniform way to secure, connect, and observe microservices.

Why Use Istio?

In a microservices architecture, applications are decomposed into many small services that communicate over the network. This introduces challenges:

  • Service-to-service communication: Managing secure, reliable communication between hundreds of services
  • Observability: Understanding traffic flow, latency, and failures across services
  • Security: Enforcing authentication, authorization, and encryption between services
  • Traffic management: Implementing advanced routing, load balancing, and resilience patterns
  • Policy enforcement: Applying consistent policies across all services

Istio solves these challenges by providing a transparent infrastructure layer that sits between your services and the network, handling cross-cutting concerns without requiring changes to application code.

Key Benefits

  • Traffic control: Fine-grained control over traffic routing and behavior
  • Security: Automatic mTLS, authentication, and authorization
  • Observability: Metrics, logs, and distributed tracing out-of-the-box
  • Resilience: Circuit breakers, retries, timeouts, and fault injection
  • Policy enforcement: Centralized policy management and rate limiting

Service Mesh Fundamentals

What is a Service Mesh?

A service mesh is a dedicated infrastructure layer for managing service-to-service communication.

┌───────────────────────────────────────────────────────────────┐  
│                        Service Mesh                           │  
│                                                               │  
│  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐     │  
│  │Service A│    │Service B│    │Service C│    │Service D│     │  
│  │  ┌───┐  │    │  ┌───┐  │    │  ┌───┐  │    │  ┌───┐  │     │  
│  │  │App│  │    │  │App│  │    │  │App│  │    │  │App│  │     │  
│  │  └───┘  │    │  └───┘  │    │  └───┘  │    │  └───┘  │     │  
│  │  ┌───┐  │    │  ┌───┐  │    │  ┌───┐  │    │  ┌───┐  │     │  
│  │  │ P │<─┼────┼─>│ P │<─┼────┼─>│ P │<─┼────┼─>│ P │  │     │  
│  │  └───┘  │    │  └───┘  │    │  └───┘  │    │  └───┘  │     │  
│  └─────────┘    └─────────┘    └─────────┘    └─────────┘     │  
│       ↑              ↑              ↑              ↑          │  
│       └──────────────┴──────────────┴──────────────┘          │  
│                  Proxy Network (Data Plane)                   │  
│                           ↕                                   │  
│                  ┌──────────────────┐                         │  
│                  │  Control Plane   │                         │  
│                  │   (istiod)       │                         │  
│                  └──────────────────┘                         │  
└───────────────────────────────────────────────────────────────┘  
  
P = Proxy (Envoy)

The service mesh consists of:

  • Data Plane: Network of proxies that handle all inter-service communication
  • Control Plane: Manages and configures the proxies

Istio Architecture

Istio’s architecture is divided into two main components:

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐  
│                          Istio Mesh                             │  
│                                                                 │  
│  ┌───────────────────────────────────────────────────────────┐  │  
│  │                    Control Plane                          │  │  
│  │                                                           │  │  
│  │              ┌───────────────────────┐                    │  │  
│  │              │      istiod           │                    │  │  
│  │              │  ┌─────────────────┐  │                    │  │  
│  │              │  │ Pilot           │  │  Configuration     │  │  
│  │              │  │ - Service Disc. │  │  & Service         │  │  
│  │              │  │ - Traffic Mgmt  │  │  Discovery         │  │  
│  │              │  └─────────────────┘  │                    │  │  
│  │              │  ┌─────────────────┐  │                    │  │  
│  │              │  │ Citadel (CA)    │  │  Certificate       │  │  
│  │              │  │ - mTLS/PKI      │  │  Management        │  │  
│  │              │  └─────────────────┘  │                    │  │  
│  │              │  ┌─────────────────┐  │                    │  │  
│  │              │  │ Galley          │  │  Config            │  │  
│  │              │  │ - Validation    │  │  Validation        │  │  
│  │              │  └─────────────────┘  │                    │  │  
│  │              └───────────────────────┘                    │  │  
│  │                         ↓                                 │  │  
│  └─────────────────────────┼─────────────────────────────────┘  │  
│                            ↓                                    │  
│  ┌───────────────────────────────────────────────────────────┐  │  
│  │                    Data Plane                             │  │  
│  │                                                           │  │  
│  │  ┌──────────┐      ┌──────────┐      ┌──────────┐         │  │  
│  │  │  Pod A   │      │  Pod B   │      │  Pod C   │         │  │  
│  │  │ ┌──────┐ │      │ ┌──────┐ │      │ ┌──────┐ │         │  │  
│  │  │ │ App  │ │      │ │ App  │ │      │ │ App  │ │         │  │  
│  │  │ └──────┘ │      │ └──────┘ │      │ └──────┘ │         │  │  
│  │  │ ┌──────┐ │      │ ┌──────┐ │      │ ┌──────┐ │         │  │  
│  │  │ │Envoy │<┼──────┼>│Envoy │<┼──────┼>│Envoy │ │         │  │  
│  │  │ │Proxy │ │      │ │Proxy │ │      │ │Proxy │ │         │  │  
│  │  │ └──────┘ │      │ └──────┘ │      │ └──────┘ │         │  │  
│  │  └──────────┘      └──────────┘      └──────────┘         │  │  
│  └───────────────────────────────────────────────────────────┘  │  
└─────────────────────────────────────────────────────────────────┘

Control Plane (istiod)

In modern Istio (1.5+), the control plane is consolidated into a single binary called istiod, which includes:

Pilot

  • Service Discovery: Maintains a registry of all services and their endpoints
  • Traffic Management: Converts high-level routing rules into Envoy configurations
  • Configuration Distribution: Pushes configurations to all Envoy proxies
  • Supports: A/B testing, canary deployments, traffic splitting, circuit breakers, retries, timeouts

Citadel (Certificate Authority)

  • Certificate Management: Issues and rotates X.509 certificates for workloads
  • Identity: Provides strong identity to each service
  • mTLS: Enables automatic mutual TLS encryption between services
  • SPIFFE: Implements SPIFFE standard for service identity

Galley

  • Configuration Validation: Validates user-authored Istio configuration
  • Configuration Ingestion: Processes and distributes configuration to istiod
  • Abstraction: Isolates istiod from underlying platform (Kubernetes, VMs)

Mixer (Deprecated)

  • Note: Mixer has been deprecated and removed in Istio 1.7+
  • Previously handled:
    • Access control and policy checks
    • Telemetry data collection
    • These functions are now handled by Envoy proxies directly (via WASM extensions)

Data Plane

The data plane consists of Envoy proxies deployed alongside each service:

  • Envoy Proxy: High-performance C++ proxy originally built by Lyft

  • Sidecar Pattern: In sidecar mode, each pod gets an Envoy container

  • Traffic Interception: All inbound/outbound traffic goes through the proxy

  • Capabilities:

    • Dynamic service discovery
    • Load balancing
    • TLS termination
    • HTTP/2 and gRPC proxying
    • Circuit breakers
    • Health checks
    • Staged rollouts with percentage-based traffic splits
    • Fault injection
    • Rich metrics

Deployment Modes

Istio supports multiple deployment modes to fit different use cases and requirements.

1. Sidecar Mode (Traditional)

In sidecar mode, Istio deploys an Envoy proxy container alongside each application pod:

┌────────────────────────────────────────┐  
│            Kubernetes Pod              │  
│                                        │  
│  ┌──────────────┐    ┌──────────────┐  │  
│  │              │    │              │  │  
│  │ Application  │◄───┤ Envoy Proxy  │  │  
│  │  Container   │    │  (istio-proxy)  │  
│  │              │    │              │  │  
│  └──────────────┘    └───────┬──────┘  │  
│                              │         │  
└──────────────────────────────┼─────────┘  
                               │  
                        All traffic flows  
                        through proxy

Characteristics:

  • Each pod gets its own Envoy proxy sidecar container
  • Proxy intercepts all inbound and outbound traffic using iptables rules
  • Full Layer 7 (HTTP/gRPC) capabilities per pod
  • Pros: Complete feature set, mature, well-tested
  • Cons: Higher resource overhead (one proxy per pod)

Use cases:

  • Production environments requiring full L7 features
  • Applications needing advanced traffic management
  • When resource overhead is acceptable

2. Ambient Mode (Sidecarless)

Ambient mode is a newer deployment model that reduces resource overhead by eliminating per-pod sidecars:

┌────────────────────────────────────────────────────────────────┐  
│                         Kubernetes Node                        │  
│                                                                │  
│  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐                        │  
│  │ Pod  │  │ Pod  │  │ Pod  │  │ Pod  │  (No sidecars!)        │  
│  │      │  │      │  │      │  │      │                        │  
│  └───┬──┘  └───┬──┘  └───┬──┘  └───┬──┘                        │  
│      │         │         │         │                           │  
│      └─────────┴─────────┴─────────┘                           │  
│                      │                                         │  
│              ┌───────▼────────┐                                │  
│              │    ztunnel     │  Layer 4 (per-node)            │  
│              │  (L4 Proxy)    │  - mTLS                        │  
│              └───────┬────────┘  - Basic routing               │  
│                      │                                         │  
└──────────────────────┼─────────────────────────────────────────┘  
                       │  
                       ▼  
           ┌───────────────────────┐  
           │  Waypoint Proxy       │  Layer 7 (per-namespace)  
           │  (Optional)           │  - Advanced routing  
           │  - Full L7 features   │  - Traffic policies  
           └───────────────────────┘

Ambient mode has two components:

ztunnel (Zero Trust Tunnel)

  • Runs as a DaemonSet (one per node)

  • Handles Layer 4 traffic (TCP)

  • Provides:

  • Mutual TLS (mTLS) encryption

  • Basic authentication and authorization

  • Telemetry at L4

  • Lightweight and efficient

Waypoint Proxies

  • Optional per-namespace Envoy proxies

  • Provide Layer 7 (HTTP/gRPC) features

  • Deploy only when you need advanced capabilities:

  • Complex routing rules

  • Request-level policies

  • HTTP header manipulation

  • Fault injection

  • Advanced observability

Characteristics:

  • No sidecar containers in application pods
  • Significantly reduced resource consumption
  • Gradual adoption of L7 features (opt-in per namespace)
  • Pros: Lower resource overhead, simpler upgrades
  • Cons: Newer (less mature), limited L7 features without waypoint proxies

Use cases:

  • Large-scale deployments where resource efficiency is critical
  • Environments with many simple services
  • Gradual migration from no mesh to full mesh

Comparison: Sidecar vs Ambient

Aspect         Sidecar Mode          Ambient Mode           
Resource overhead  High (proxy per pod)      Low (shared proxies)        
L7 features        Always available          Opt-in via waypoint         
Maturity           Stable, production-ready  Newer (Istio 1.15+)         
Upgrade complexityRolling pod restarts      Simpler (node-level)        
Best for           Feature-rich environmentsLarge-scale, cost-sensitive

Core Components

Istio uses several Custom Resource Definitions (CRDs) to configure service mesh behavior. Understanding these resources is essential for effective traffic management.

Configuration Resources Overview

External Traffic  
       ↓  
┌──────────────┐  
│   Gateway    │  ← Defines ports/hosts for mesh entry  
└──────┬───────┘  
       ↓  
┌──────────────┐  
│VirtualService│  ← Routing rules (where to send traffic)  
└──────┬───────┘  
       ↓  
┌──────────────┐  
│DestinationRule  ← Policies (how to handle traffic)  
└──────┬───────┘  
       ↓  
┌──────────────┐  
│   Service    │  ← Kubernetes Service  
└──────┬───────┘  
       ↓  
┌──────────────┐  
│     Pod      │  ← Application workload  
└──────────────┘

1. VirtualService (Traffic Routing)

VirtualService defines routing rules that specify how requests are routed to services within the mesh.

Key capabilities:

  • Route traffic based on HTTP headers, URI paths, source labels
  • Split traffic across multiple service versions (for canary deployments)
  • Add timeouts, retries, and fault injection
  • Redirect and rewrite URLs

Example use cases:

  • Route 90% of traffic to v1 and 10% to v2 (canary testing)
  • Route requests with header user: premium to a special backend
  • Add automatic retries on connection failures
apiVersion: networking.istio.io/v1beta1  
kind: VirtualService  
metadata:  
  name: reviews-route  
spec:  
  hosts:  
  - reviews.default.svc.cluster.local  
  http:  
  - match:  
    - headers:  
        user-type:  
          exact: premium  
    route:  
    - destination:  
        host: reviews  
        subset: v2  
  - route:  
    - destination:  
        host: reviews  
        subset: v1  
      weight: 90  
    - destination:  
        host: reviews  
        subset: v2  
      weight: 10  
    retries:  
      attempts: 3  
      retryOn: "5xx,reset,connect-failure"  
    timeout: 5s

2. DestinationRule (Traffic Policies)

DestinationRule defines policies that apply to traffic after routing has occurred. These are policies for the “real” destination.

Key capabilities:

  • Define service subsets (versions) based on labels
  • Configure load balancing algorithms
  • Set up connection pool settings
  • Enable/configure mutual TLS
  • Configure circuit breakers and outlier detection

Example use cases:

  • Define subsets for different versions (v1, v2, v3)
  • Use least-connection load balancing
  • Enable circuit breaker to prevent cascading failures
apiVersion: networking.istio.io/v1beta1  
kind: DestinationRule  
metadata:  
  name: reviews-destination  
spec:  
  host: reviews  
  trafficPolicy:  
    loadBalancer:  
      simple: LEAST_CONN  
    connectionPool:  
      tcp:  
        maxConnections: 100  
      http:  
        http1MaxPendingRequests: 50  
        http2MaxRequests: 100  
    outlierDetection:  
      consecutiveErrors: 5  
      interval: 30s  
      baseEjectionTime: 30s  
  subsets:  
  - name: v1  
    labels:  
      version: v1  
  - name: v2  
    labels:  
      version: v2  
    trafficPolicy:  
      loadBalancer:  
        simple: ROUND_ROBIN

Load Balancing Algorithms

Istio supports multiple load balancing strategies:

Algorithm         Description                                     Use Case                                     
ROUND_ROBIN     Distributes requests evenly in rotation         Default, works well for homogeneous backends
LEAST_CONN     Sends to backend with fewest active connectionsBackends with varying load capacity          
LEAST_REQUEST   Sends to backend with fewest active requests    HTTP/2 and gRPC workloads                    
RANDOM         Randomly selects a backen                       Simple, low-overhead distribution            
PASSTHROUGH     Forwards without load balancin                  Direct connection scenarios                  
CONSISTENT_HASH Hash-based distribution (sticky sessions)       Session affinity requirements                

Connection Pool Settings

LoadBalancerSettings options:

  • simple: Standard algorithms (ROUND_ROBIN, LEAST_CONN, etc.)
  • consistentHash: Hash-based routing for session affinity
  • localityLbSetting: Locality-aware load balancing (prefer local endpoints)
  • warmupDurationSecs: Gradually increase traffic to new endpoints instead of sending full load immediately

3. Gateway (Mesh Entry/Exit Points)

Gateway configures a load balancer operating at the edge of the mesh for receiving incoming or outgoing HTTP/TCP connections.

Key capabilities:

  • Define external entry points (ingress) or exit points (egress)
  • Configure ports, protocols, and TLS settings
  • Attach to specific gateway deployments using selectors
  • Support for mutual TLS (mTLS) authentication

Example use cases:

  • Expose services to external clients via HTTPS
  • Configure mTLS for client certificate authentication
  • Set up egress gateway for controlled external API access
apiVersion: networking.istio.io/v1beta1  
kind: Gateway  
metadata:  
  name: my-gateway  
  namespace: istio-system  
spec:  
  selector:  
    istio: ingressgateway  # Selects the ingress gateway pods  
  servers:  
  - port:  
      number: 443  
      name: https  
      protocol: HTTPS  
    hosts:  
    - "myapp.example.com"  
    tls:  
      mode: SIMPLE  
      credentialName: myapp-tls-cert

4. ServiceEntry (External Services)

ServiceEntry enables adding external services (outside the mesh) into Istio’s internal service registry.

Key capabilities:

  • Add external APIs or databases to the mesh
  • Apply mesh policies to external services
  • Control and monitor traffic to external endpoints

Example use cases:

  • Integrate external payment APIs with mesh policies
  • Apply retries and timeouts to external database connections
  • Monitor traffic to third-party services
apiVersion: networking.istio.io/v1beta1  
kind: ServiceEntry  
metadata:  
  name: external-payment-api  
spec:  
  hosts:  
  - api.payment-provider.com  
  ports:  
  - number: 443  
    name: https  
    protocol: HTTPS  
  location: MESH_EXTERNAL  
  resolution: DNS

5. Sidecar (Proxy Configuration)

Sidecar resource fine-tunes the configuration of sidecar proxies attached to workloads.

Key capabilities:

  • Limit the set of services a sidecar can reach
  • Optimize resource usage by reducing configuration size
  • Control inbound and outbound traffic behavior

Example use cases:

  • Reduce memory footprint in large meshes
  • Restrict which services a workload can communicate with
  • Improve proxy startup time
apiVersion: networking.istio.io/v1beta1  
kind: Sidecar  
metadata:  
  name: default  
  namespace: my-app  
spec:  
  egress:  
  - hosts:  
    - "./*"  # Only allow traffic within same namespace  
    - "istio-system/*"

Traffic Management

Traffic management is one of Istio’s core features, enabling sophisticated control over service-to-service communication.

Request Routing

Control where traffic goes based on various criteria:

Path-based routing:

http:  
- match:  
  - uri:  
      prefix: /api/v1  
  route:  
  - destination:  
      host: service-v1  
- match:  
  - uri:  
      prefix: /api/v2  
  route:  
  - destination:  
      host: service-v2

Header-based routing:

http:  
- match:  
  - headers:  
      x-api-version:  
        exact: "2.0"  
  route:  
  - destination:  
      host: service-v2

Traffic Splitting (Canary Deployments)

Gradually shift traffic from old version to new version:

http:  
- route:  
  - destination:  
      host: reviews  
      subset: v1  
    weight: 80  
  - destination:  
      host: reviews  
      subset: v2  
    weight: 20

Deployment strategy:

  • Deploy v2 alongside v1
  • Route 10% → v2, 90% → v1
  • Monitor metrics and errors
  • Gradually increase v2 traffic: 25%, 50%, 75%, 100%
  • Decommission v1

Timeouts and Retries

Timeouts prevent requests from hanging indefinitely:

http:  
- route:  
  - destination:  
      host: my-service  
  timeout: 5s

Retries automatically retry failed requests:

http:  
- route:  
  - destination:  
      host: my-service  
  retries:  
    attempts: 3  
    perTryTimeout: 2s  
    retryOn: "5xx,reset,connect-failure,refused-stream"

Circuit Breaking

Prevent cascading failures by limiting connections to unhealthy services:

trafficPolicy:  
  connectionPool:  
    tcp:  
      maxConnections: 100  
    http:  
      http1MaxPendingRequests: 50  
      maxRequestsPerConnection: 5  
  outlierDetection:  
    consecutiveErrors: 5  
    interval: 30s  
    baseEjectionTime: 30s  
    maxEjectionPercent: 50

How it works:

  • Service starts experiencing errors
  • After 5 consecutive errors, Istio ejects the endpoint for 30s
  • Gradual recovery: endpoint gets limited traffic to test health
  • If healthy, fully restored; if not, ejected again

Fault Injection (Chaos Engineering)

Test application resilience by injecting faults:

Delay injection (simulate slow networks):

http:  
- fault:  
    delay:  
      percentage:  
        value: 10  
      fixedDelay: 5s  
  route:  
  - destination:  
      host: my-service

Abort injection (simulate service failures):

http:  
- fault:  
    abort:  
      percentage:  
        value: 20  
      httpStatus: 503  
  route:  
  - destination:  
      host: my-service

Traffic Mirroring (Shadowing)

Send copy of live traffic to a test service without affecting production:

http:  
- route:  
  - destination:  
      host: service-v1  
  mirror:  
    host: service-v2  
  mirrorPercentage:  
    value: 50

Use cases:

  • Test new version with real traffic without risk
  • Compare performance between versions
  • Validate refactored services

Security

Istio provides multiple layers of security for microservices.

Mutual TLS (mTLS)

Automatic mTLS encrypts all service-to-service communication and provides strong identity.

┌──────────┐                                    ┌──────────┐  
│ Service A│                                    │ Service B│  
│  ┌────┐  │                                    │  ┌────┐  │  
│  │App │  │                                    │  │App │  │  
│  └──┬─┘  │                                    │  └─┬──┘  │  
│     │    │                                    │    │     │  
│  ┌──▼──┐ │  1. Establish mTLS connection      │  ┌─▼───┐ │  
│  │Envoy│─┼────────────────────────────────────┼─►│Envoy│ │  
│  │     │ │  2. Verify certificates (both ways)│  │     │ │  
│  │     │◄┼────────────────────────────────────┼──│     │ │  
│  │     │ │  3. Encrypted communication        │  │     │ │  
│  └─────┘ │◄────────────────────────────────►  │  └─────┘ │  
└──────────┘                                    └──────────┘  
     │                                                 │  
     └────────────── Citadel (CA) ─────────────────────┘  
              (Issues & rotates certificates)

Configuration modes:

apiVersion: security.istio.io/v1beta1  
kind: PeerAuthentication  
metadata:  
  name: default  
  namespace: my-namespace  
spec:  
  mtls:  
    mode: STRICT  # Options: STRICT, PERMISSIVE, DISABLE
  • STRICT: Only accept mTLS connections
  • PERMISSIVE: Accept both mTLS and plaintext (for migration)
  • DISABLE: Disable mTLS

Authorization Policies

Control who can access what services:

apiVersion: security.istio.io/v1beta1  
kind: AuthorizationPolicy  
metadata:  
  name: frontend-policy  
  namespace: default  
spec:  
  selector:  
    matchLabels:  
      app: frontend  
  action: ALLOW  
  rules:  
  - from:  
    - source:  
        principals: ["cluster.local/ns/default/sa/api-gateway"]  
    to:  
    - operation:  
        methods: ["GET", "POST"]  
        paths: ["/api/*"]

Common patterns:

  • Allow only specific services to call an API
  • Restrict HTTP methods (e.g., only GET and POST)
  • Deny access to admin endpoints except from specific namespaces

Request Authentication (JWT)

Validate JWT tokens from external identity providers:

apiVersion: security.istio.io/v1beta1  
kind: RequestAuthentication  
metadata:  
  name: jwt-auth  
spec:  
  selector:  
    matchLabels:  
      app: api-service  
  jwtRules:  
  - issuer: "https://auth.example.com"  
    jwksUri: "https://auth.example.com/.well-known/jwks.json"

Use cases:

  • Validate OAuth2/OIDC tokens
  • Enforce authentication for external API calls
  • Extract user identity from JWT claims

Observability

Istio automatically generates telemetry for all traffic in the mesh without requiring application changes.

Three Pillars of Observability

┌───────────────────────────────────────────────────────────┐  
│                    Observability Stack                    │  
│                                                           │  
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │  
│  │   Metrics   │  │    Logs     │  │   Traces    │        │  
│  │             │  │             │  │             │        │  
│  │ Prometheus  │  │    Fluentd  │  │    Jaeger   │        │  
│  │   Grafana   │  │     ELK     │  │    Zipkin   │        │  
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘        │  
│         │                │                │               │  
│         └────────────────┼────────────────┘               │  
│                          │                                │  
│              ┌───────────▼───────────┐                    │  
│              │   Envoy Proxies       │                    │  
│              │  (Generate telemetry) │                    │  
│              └───────────────────────┘                    │  
└───────────────────────────────────────────────────────────┘

1. Metrics

Istio automatically collects:

  • Request rate: Requests per second
  • Request latency: P50, P90, P95, P99 percentiles
  • Error rate: 4xx and 5xx responses
  • Request size: Bytes sent/received

Key metrics:

  • istio_requests_total: Total request count
  • istio_request_duration_milliseconds: Request latency
  • istio_request_bytes: Request size
  • istio_response_bytes: Response size

Golden Signals (RED method):

  • Rate: Requests per second
  • Errors: Percentage of failed requests
  • Duration: Request latency distribution

2. Distributed Tracing

Track requests as they flow through multiple services:

User Request → API Gateway → Auth Service → Product Service → DB  
                  20ms          15ms            100ms         50ms  
   │──────────────────────────────────────────────────────────│  
                     Total Latency: 185ms

Trace components:

  • Trace: End-to-end request journey
  • Span: Single operation (e.g., one service call)
  • Tags: Metadata (HTTP method, status code, etc.)

Integration: Jaeger, Zipkin

  • Visualize request flow
  • Identify bottlenecks
  • Debug latency issues

3. Access Logs

Envoy generates detailed access logs:

{  
  "start_time": "2024-01-15T10:30:00.000Z",  
  "method": "GET",  
  "path": "/api/products",  
  "response_code": 200,  
  "duration": 45,  
  "upstream_service": "products.default.svc.cluster.local",  
  "user_agent": "Mozilla/5.0...",  
  "request_id": "abc-123-def-456"  
}

Service Discovery and Endpoints

Istio’s Pilot component provides automatic service discovery:

How it works:

  • Kubernetes creates Endpoints for each Service
  • Pilot watches Kubernetes API for changes
  • Pilot pushes updated endpoint information to all Envoy proxies
  • Proxies use this info for load balancing

Real-world example:

# Service definition  
$ kubectl -n myapp get svc myapp-test  
NAME           TYPE        CLUSTER-IP       PORT(S)  
myapp-test   ClusterIP   10.100.229.132   8443/TCP  
  
# Endpoints (actual pod IPs)  
$ kubectl -n myapp get endpoints myapp-test  
NAME           ENDPOINTS  
myapp-test   240.48.67.221:8080,240.48.69.154:8080  
  
# Pods backing the service  
$ kubectl -n myapp get pods -o wide  
NAME                            READY   IP  
myapp-test-78ddbd8c64-9bkzb   3/3     240.48.69.154  
myapp-test-78ddbd8c64-tsnfw   3/3     240.48.67.221

Dynamic updates:

  • Pod scales up → New endpoint added → Pilot updates all proxies
  • Pod becomes unhealthy → Endpoint removed → Traffic stops routing to it
  • Zero-downtime deployments

Advanced Concepts

Graceful Termination and Connection Draining

When pods are terminated, ensure graceful shutdown:

Termination Flow:  
  App Container → istio-proxy → ingress-gateway → Load Balancer

Key settings:

  • terminationGracePeriodSeconds (Pod level): Time Kubernetes waits before killing pod
  • drainDuration (Istio): Time Envoy waits before closing connections
apiVersion: networking.istio.io/v1beta1  
kind: EnvoyFilter  
metadata:  
  name: drain-duration  
spec:  
  configPatches:  
  - applyTo: CLUSTER  
    patch:  
      operation: MERGE  
      value:  
        drain_connections_on_host_removal: true

Best practices:

  • Set terminationGracePeriodSeconds: 30 (or higher)
  • Configure drainDuration to allow connections to complete
  • Use preStop hooks to delay SIGTERM
  • Implement health check endpoints

Multi-Cluster Mesh

Connect services across multiple Kubernetes clusters:

┌──────────────┐         ┌──────────────┐  
│  Cluster A   │         │  Cluster B   │  
│              │         │              │  
│ ┌──────────┐ │         │ ┌──────────┐ │  
│ │ Service A│ │◄───────►│ │ Service B│ │  
│ └──────────┘ │         │ └──────────┘ │  
│              │         │              │  
│   istiod-A   │         │   istiod-B   │  
└──────────────┘         └──────────────┘  
       │                        │  
       └────────────────────────┘  
            Shared control plane  
            (or federated)

Deployment models:

  • Single control plane: One istiod manages multiple clusters
  • Multi-primary: Each cluster has its own control plane
  • Primary-remote: One primary, others are remote

Locality-Aware Load Balancing

Route traffic to nearby services first:

trafficPolicy:  
  loadBalancer:  
    localityLbSetting:  
      enabled: true  
      distribute:  
      - from: "us-west/us-west-1/*"  
        to:  
          "us-west/us-west-1/*": 80  
          "us-west/us-west-2/*": 20  
      failover:  
      - from: "us-west/us-west-1/*"  
        to: "us-east/us-east-1/*"

Benefits:

  • Reduced latency (same region/zone)
  • Lower data transfer costs
  • High availability (automatic failover)

WebAssembly (WASM) Extensions

Extend Envoy proxy with custom logic:

apiVersion: extensions.istio.io/v1alpha1  
kind: WasmPlugin  
metadata:  
  name: custom-auth  
spec:  
  selector:  
    matchLabels:  
      app: api-service  
  url: oci://my-registry/custom-auth-plugin:v1.0  
  phase: AUTHN

Use cases:

  • Custom authentication/authorization
  • Request/response transformation
  • Rate limiting
  • Custom telemetry

Real-World Examples

This section walks through practical examples using actual Kubernetes resources, demonstrating how traffic flows through an Istio service mesh.

Complete Traffic Flow

Understanding the full request path through Istio:

External Client  
      ↓  
   NLB (Network Load Balancer)  
      ↓  
┌─────────────────────────────────────────────────────────────────┐  
│                    Kubernetes Cluster                           │  
│                                                                 │  
│  ┌────────────────────────────────────────────────────────────┐ │  
│  │ Istio Ingress Gateway (app-ingress-gateway)                │ │  
│  │        (Envoy proxy deployment)                            │ │  
│  │    • Listens on configured ports (e.g., 8443)              │ │  
│  │    • Receives configuration from istiod                    │ │  
│  └────────────────────┬───────────────────────────────────────┘ │  
│                       ↓                                         │  
│  ┌────────────────────────────────────────────────────────────┐ │  
│  │ Gateway Resource                                           │ │  
│  │   • Defines ports, protocols, TLS settings                 │ │  
│  │   • Selects ingress gateway pods via label selector        │ │  
│  └────────────────────┬───────────────────────────────────────┘ │  
│                       ↓                                         │  
│  ┌────────────────────────────────────────────────────────────┐ │  
│  │ VirtualService                                             │ │  
│  │   • Matches incoming requests (host, path, headers)        │ │  
│  │   • Defines routing rules and destinations                 │ │  
│  │   • Configures retries, timeouts                           │ │  
│  └────────────────────┬───────────────────────────────────────┘ │  
│                       ↓                                         │  
│  ┌────────────────────────────────────────────────────────────┐ │  
│  │ DestinationRule (Optional)                                 │ │  
│  │   • Defines subsets (versions)                             │ │  
│  │   • Load balancing policies                                │ │  
│  │   • Connection pool settings                               │ │  
│  └────────────────────┬───────────────────────────────────────┘ │  
│                       ↓                                         │  
│  ┌────────────────────────────────────────────────────────────┐ │  
│  │ Kubernetes Service                                         │ │  
│  │   • ClusterIP with stable DNS name                         │ │  
│  │   • Selects pods via label selectors                       │ │  
│  │   • Maps service port to container targetPort              │ │  
│  └────────────────────┬───────────────────────────────────────┘ │  
│                       ↓                                         │  
│  ┌────────────────────────────────────────────────────────────┐ │  
│  │ Endpoints                                                  │ │  
│  │   • Dynamic list of pod IPs and ports                      │ │  
│  │   • Automatically updated as pods scale/fail               │ │  
│  └────────────────────┬───────────────────────────────────────┘ │  
│                       ↓                                         │  
│  ┌────────────────────────────────────────────────────────────┐ │  
│  │ Application Pod                                            │ │  
│  │  ┌──────────────┐     ┌──────────────┐                     │ │  
│  │  │  istio-proxy │     │ Application  │                     │ │  
│  │  │   (Envoy)    │────►│  Container   │                     │ │  
│  │  └──────────────┘     └──────────────┘                     │ │  
│  └────────────────────────────────────────────────────────────┘ │  
└─────────────────────────────────────────────────────────────────┘

Key points:

  • Istio Ingress Gateway is a pod running Envoy proxy (not a hardcoded config)
  • Gateway, VirtualService, DestinationRule are configuration objects that tell the proxies how to route
  • istiod (control plane) pushes all configurations to Envoy proxies at runtime
  • Configurations are dynamic and can be updated without restarting pods

Example 1: Istio Control Plane Components

Viewing the Istio system components:

$ kubectl -n istio-system get pods -o wide  
NAME                                   READY   STATUS    IP              NODE  
app-ingress-gateway-68945bdbd7-5dxxr   1/1     Running   240.48.71.138   ip-240-48-71-119  
app-ingress-gateway-68945bdbd7-jkbzv   1/1     Running   240.48.68.10    ip-240-48-69-0  
app-ingress-gateway-68945bdbd7-lj5fl   1/1     Running   240.48.67.64    ip-240-48-67-148  
istiod-fd589774b-2cl2l                 1/1     Running   240.48.71.251   ip-240-48-71-119  
istiod-fd589774b-dd2xw                 1/1     Running   240.48.69.34    ip-240-48-69-0  
istiod-fd589774b-k5dgc                 1/1     Running   240.48.67.115   ip-240-48-67-148

Observations:

  • app-ingress-gateway: Multiple replicas (3) for high availability
  • Each pod is an Envoy proxy acting as the entry point
  • Distributed across different nodes for fault tolerance
  • Target IPs are registered with external load balancer
  • istiod: Control plane component (3 replicas for HA)
  • Manages configuration for all proxies
  • Provides service discovery and certificate management

Example 2: Gateway Configuration

Defining an ingress gateway with mutual TLS:

$ kubectl -n istio-system get gateway myapp-mgt-qa1-usw2 -o yaml
apiVersion: networking.istio.io/v1beta1  
kind: Gateway  
metadata:  
  name: myapp-mgt-qa1-usw2  
  namespace: istio-system  
spec:  
  selector:  
    istio: app-ingress-gateway    # Selects ingress gateway pods with this label  
  servers:  
  - hosts:  
    - myapp-qa1-usw2.mgt-nonprod.myorg.com  # Hostname to handle  
    port:  
      name: https-mutual  
      number: 8443  
      protocol: HTTPS  
    tls:  
      mode: MUTUAL                # Requires client certificate authentication  
      minProtocolVersion: TLSV1_2  
      maxProtocolVersion: TLSV1_3  
      serverCertificate: /etc/istio/tls/tls.crt  
      privateKey: /etc/istio/tls/tls.key  
      caCertificates: /etc/myorg/ca/myorg_corp_auth_ca1.pem

Key observations:

  • selector: Uses label selector to identify which ingress gateway pods handle this config
  • hosts: Defines the hostname this gateway will accept traffic for
  • port: Listens on port 8443 for HTTPS traffic
  • tls.mode = MUTUAL: Requires both server and client certificates (strong authentication)
  • Certificates: Mounted from Kubernetes secrets/config maps into the gateway pods
  • Wildcard support: If hosts = ["*"], gateway accepts any hostname

Example 3: VirtualService Routing

VirtualService defines routing rules to forward requests to backend services:

$ kubectl -n myapp get vs myapp-test-vs -o yaml
apiVersion: networking.istio.io/v1beta1  
kind: VirtualService  
metadata:  
  name: myapp-test-vs  
  namespace: myapp  
spec:  
  gateways:  
  - istio-system/myapp-mgt-qa1-usw2    # References the Gateway (cross-namespace)  
  hosts:  
  - myapp-test.myapp.svc.cluster.local           # Internal DNS name  
  - myapp-qa1-usw2.mgt-nonprod.myorg.com # External hostname (matches Gateway)  
  http:  
  - retries:  
      attempts: 1  
      retryOn: connect-failure,refused-stream  
    route:  
    - destination:  
        host: myapp-test    # Routes to Service named "myapp-test"

Key observations:

  • gateways: Links to the Gateway resource (can be in different namespace)
  • hosts: Accepts requests for both internal and external hostnames
  • myapp-test.myapp.svc.cluster.local: Internal mesh traffic
  • myapp-qa1-usw2.mgt-nonprod.myorg.com: External traffic (must match Gateway host)
  • retries: Automatically retry failed requests (resilience)
  • destination.host: Routes to Kubernetes Service name (not pod directly)

How they work together:

  • External request arrives at Gateway with hostname myapp-qa1-usw2...
  • Gateway accepts it (hostname matches its configuration)
  • VirtualService matches the hostname and applies routing rules
  • Traffic is routed to the myapp-test Service

Example 4: Kubernetes Service and Endpoints

The Service provides stable networking and service discovery:

$ kubectl -n myapp get svc myapp-test  
NAME           TYPE        CLUSTER-IP       PORT(S)  
myapp-test   ClusterIP   10.100.229.132   8443/TCP
$ kubectl -n myapp get svc myapp-test -o yaml  
apiVersion: v1  
kind: Service  
metadata:  
  name: myapp-test  
  namespace: myapp  
spec:  
  type: ClusterIP  
  clusterIP: 10.100.229.132  
  ports:  
  - name: http  
    port: 8443         # Port exposed by the Service (client-facing)  
    protocol: TCP  
    targetPort: 8080   # Port on the Pod container  
  selector:  
    app.kubernetes.io/instance: myapp-myapp    # Selects pods with these labels  
    app.kubernetes.io/name: myapp-test

Key observations:

  • ClusterIP: Virtual IP accessible only within the cluster
  • port vs targetPort:
  • port: 8443: Clients connect to Service on this port
  • targetPort: 8080: Service forwards to pod containers on this port
  • Allows decoupling external API from internal implementation
  • selector: Labels that identify which pods receive traffic

Endpoints (automatically managed by Kubernetes):

$ kubectl -n myapp get endpoints myapp-test  
NAME           ENDPOINTS  
myapp-test   240.48.67.221:8080,240.48.69.154:8080

Key observations:

  • Endpoints list shows actual pod IPs and ports
  • Dynamically updated as pods scale, restart, or fail
  • Istio’s Pilot watches these endpoints and configures Envoy proxies

Example 5: Application Pods

The actual workload running the application:

$ kubectl -n myapp get pods -o wide  
NAME                            READY   STATUS    IP              NODE  
myapp-test-78ddbd8c64-9bkzb   3/3     Running   240.48.69.154   ip-240-48-69-0  
myapp-test-78ddbd8c64-tsnfw   3/3     Running   240.48.67.221   ip-240-48-67-148

Notice: READY 3/3 indicates 3 containers per pod:

  • Application container: Your app code
  • istio-proxy: Envoy sidecar injected by Istio
  • istio-init: Init container that sets up iptables rules (completed, not counted in READY)

Inspecting a pod:

$ kubectl -n myapp get pod myapp-test-78ddbd8c64-9bkzb -o yaml

Key Istio-specific annotations in pod metadata:

annotations:  
  # Istio sidecar injection  
  sidecar.istio.io/inject: "true"                # Enables automatic sidecar injection  
  istio.io/rev: default                          # Istio control plane revision  
  
  # Proxy configuration  
  proxy.istio.io/config: |  
    holdApplicationUntilProxyStarts: true        # App waits for proxy to be ready  
  
  # Security and identity  
  security.istio.io/tlsMode: istio               # Uses Istio mTLS  
  service.istio.io/canonical-name: myapp-test  
  service.istio.io/canonical-revision: latest  
  
  # Prometheus metrics  
  prometheus.io/scrape: "true"  
  prometheus.io/port: "15020"                    # Envoy metrics port  
  prometheus.io/path: /stats/prometheus  
  
labels:  
  app.kubernetes.io/instance: myapp-myapp        # Matches Service selector  
  app.kubernetes.io/name: myapp-test             # Matches Service selector
  security.istio.io/tlsMode: istio

How pod IPs match endpoints:

# Pod IPs  
240.48.69.154  
240.48.67.221  
  
# Endpoints  
240.48.67.221:8080,240.48.69.154:8080  
  
# Service routes to these endpoints  
# Envoy proxies receive these endpoint IPs from Pilot  
# Load balancing happens across these pod IPs

Example 6: Complete Request Flow

Scenario: External client makes HTTPS request to https://myapp-qa1-usw2.mgt-nonprod.myorg.com/api/data

Step-by-step flow:

  • External Load Balancer (NLB):

    • Client DNS resolves to NLB IP
    • NLB forwards to one of the ingress gateway pod IPs (e.g., 240.48.71.138:8443)
  • Istio Ingress Gateway Pod:

    • Envoy proxy receives the request
    • Checks Gateway resource: hostname matches myapp-qa1-usw2...
    • Performs mTLS termination using configured certificates
    • Validates client certificate (mutual TLS)
  • VirtualService Matching:

    • Envoy checks VirtualService resources
    • Finds match: hostname myapp-qa1-usw2... → routes to myapp-test
    • Applies retry policy: retry on connect-failure,refused-stream
  • Service Resolution:

    • Resolves myapp-test service to ClusterIP 10.100.229.132
    • Pilot has pushed endpoint list to Envoy: [240.48.67.221:8080, 240.48.69.154:8080]
  • Load Balancing:

    • No DestinationRule → uses default ROUND_ROBIN
    • Selects one pod IP (e.g., 240.48.69.154:8080)
  • Pod Sidecar (istio-proxy):

    • Request arrives at pod’s Envoy sidecar (240.48.69.154:15006)
    • Sidecar applies mTLS (encrypts with destination cert)
    • Forwards to application container on port 8080
  • Application Container:

    • Receives request on localhost:8080
    • Processes request and returns response
  • Response Path (reverse of request path):

    • App → Sidecar → Ingress Gateway → NLB → Client

Key insights:

  • Every hop involves an Envoy proxy (except NLB)
  • Configuration is dynamic (no restarts needed for changes)
  • mTLS is automatic and transparent to the application
  • Observability data collected at each proxy

Summary and Best Practices

When to Use Istio

Good fit:

  • Large-scale microservices (50+ services)
  • Need for advanced traffic management (canary, A/B testing)
  • Security requirements (zero-trust, mTLS)
  • Polyglot environments (multiple languages/frameworks)
  • Complex observability needs

Not a good fit:

  • Simple applications (few services)
  • Performance-critical with tight latency budgets
  • Small teams without operational expertise
  • Limited infrastructure resources

Best Practices

  • Start incrementally:

    • Begin with sidecar injection for observability
    • Gradually add traffic management features
    • Consider ambient mode for resource efficiency
  • Security:

    • Enable STRICT mTLS in production
    • Use AuthorizationPolicies for fine-grained access control
    • Regularly rotate certificates (automated by Citadel)
  • Traffic management:

    • Always define retries and timeouts
    • Use circuit breakers to prevent cascading failures
    • Test canary deployments with small traffic percentages first
  • Performance:

    • Use Sidecar resources to limit proxy configuration size
    • Monitor resource usage (Envoy memory/CPU)
    • Consider ambient mode for large-scale deployments
  • Observability:

    • Integrate with Prometheus and Grafana for metrics
    • Set up distributed tracing (Jaeger/Zipkin)
    • Configure appropriate access log formats
  • Operations:

    • Version control all Istio configurations
    • Test configuration changes in non-production first
    • Implement graceful termination (drainDuration, terminationGracePeriodSeconds)
    • Use revision-based upgrades for control plane

Common Troubleshooting

Service not reachable:

  • Check sidecar injection: kubectl get pod <name> -o jsonpath='{.spec.containers[*].name}'
  • Verify VirtualService hosts match Gateway hosts
  • Ensure Service selector matches pod labels

mTLS errors:

  • Check PeerAuthentication mode (STRICT vs PERMISSIVE)
  • Verify certificate expiration
  • Ensure both sides have Istio proxies

High latency:

  • Check for unnecessary retries
  • Review timeout configurations
  • Monitor Envoy resource usage
  • Consider connection pool tuning

Configuration not applying:

  • Validate with istioctl analyze
  • Check istiod logs for errors
  • Verify proxy can reach istiod (network policies)

Additional Resources

Last updated on