- Published on
Understanding Kubernetes StatefulSets
- Authors
- Name
- Amit Bisht
Introduction
Kubernetes is a powerful container orchestration platform designed for deploying and managing containerized applications at scale.
While Deployments are commonly used for stateless applications, there are scenarios where the state of an application must be preserved. Enter StatefulSets, a specialized Kubernetes resource designed for managing stateful applications.
This blog explores StatefulSets in detail, including use cases, and how to implement them.
- What Are StatefulSets?
- Headless service
- StatefulSet vs Deployment
- Let's get our hands dirty
- Use cases
- Note
What Are StatefulSets?
A StatefulSet is a Kubernetes workload API object used to manage stateful applications. Unlike Deployments, which treat Pods as interchangeable, StatefulSets provide:
Stable Network Identity
Each Pod in a StatefulSet gets a DNS name in the format:
<pod-name>.<service-name>
For example, if a StatefulSet is named
my-db
and is associated with a headless servicemy-db-service
, the Pods will have DNS names like:my-db-0.my-db-service
my-db-1.my-db-service
This is critical for stateful applications, such as databases, where Pods must refer to each other reliably.
Persistent Storage
StatefulSets use PersistentVolumeClaims (PVCs) to provision storage for each Pod. Unlike Deployments, where storage is ephemeral and shared, each Pod in a StatefulSet gets its own dedicated volume, ensuring that the data persists across Pod restarts or rescheduling.
Ordered Deployment and Scaling
- Creation: Pods are created sequentially, one at a time (e.g.,
my-db-0
, thenmy-db-1
). - Termination: Pods are terminated in reverse order (e.g.,
my-db-2
, thenmy-db-1
). - Updates: StatefulSets update Pods in a rolling fashion, ensuring that at most one Pod is unavailable during an update.
- Creation: Pods are created sequentially, one at a time (e.g.,
Headless service
A headless service in Kubernetes is a service that does not create a cluster IP, meaning it doesn't have a single IP address that clients can use to access it. Instead, the service exposes the individual IPs of the pods directly.
To create a headless service, you can set the clusterIP: None
in the service definition.
DNS resolution for a headless service in Kubernetes works differently than for a standard service.
Here's how it works:
DNS Query
When a pod makes a DNS query for the headless service (e.g., my-headless-service.default.svc.cluster.local), Kubernetes doesn't return a single IP address. Instead, it returns a list of IP addresses, each corresponding to the individual pods that are part of the service.
Resolution Behavior
- For a standard service, DNS returns a single IP (the ClusterIP) for load balancing.
- For a headless service, DNS resolution returns multiple pod IPs, typically as A records or SRV records, which allows direct access to individual pods.
StatefulSet vs Deployment
Feature | StatefulSet | Deployment |
---|---|---|
Pod Identity | Stable and unique | Interchangeable |
Storage | Persistent, unique for each Pod | Ephemeral or shared |
Deployment Order | Ordered | Parallel |
Scaling Behavior | Sequential | Parallel |
Use Cases | Databases, distributed systems | Stateless apps, frontends, microservices |
Let's get our hands dirty
All commands are executed on a Kubernetes cluster deployed locally using Minikube.
Create a Deployment
Deploy Nginx as a Deployment, just for comparison.
deployment.yamlapiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx-deploy spec: replicas: 2 selector: matchLabels: app: nginx-deploy template: metadata: labels: app: nginx-deploy spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80
We can see that the pod gets random alphanumeric characters attached to its name, and these are ephemeral.
Create a Persistent Volume
This Persistent Volume (PV) is created using the host filesystem. Generally, a cloud provider is used to create volumes in production environments.
pv.yamlapiVersion: v1 kind: PersistentVolume metadata: name: nginx-pv spec: capacity: storage: 1Gi volumeMode: Filesystem accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: manual hostPath: path: /tmp/nginx-data type: DirectoryOrCreate
Create a StatefulSet
This is a StatefulSet YAML file, which differs slightly from a Deployment YAML file:
- spec.serviceName: Specifies the name of the headless service used for pod DNS address resolution.
- volumeClaimTemplates: A PersistentVolumeClaim (PVC) is created to claim storage from a PersistentVolume (PV) for each pod.
initContainers: Although not specific to StatefulSets, it is used in this case to create a custom HTML index page that returns the hostname.
sts.yamlapiVersion: apps/v1 kind: StatefulSet metadata: name: nginx-stateful spec: serviceName: "nginx-service-hl" replicas: 3 selector: matchLabels: app: nginx-stateful template: metadata: labels: app: nginx-stateful spec: initContainers: - name: setup image: busybox:1.28 command: ["/bin/sh", "-c", "echo '<h1>Hello !! </h1>I am loaded from <b>nginx-pvc</b> <br>Pod name: <b>\'$HOSTNAME\'' > /usr/share/nginx/html/index.html && echo 'done' && exit "] volumeMounts: - mountPath: /usr/share/nginx/html name: nginx-data containers: - name: nginx image: nginx:latest ports: - containerPort: 80 volumeMounts: - name: nginx-data mountPath: /usr/share/nginx/html # Mount same volume to nginx container volumeClaimTemplates: - metadata: name: nginx-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 1Gi
Each of the three pods gets two PersistentVolumeClaims (PVCs) attached, enabling them to use storage from the PersistentVolumes (PVs).
Now, let's explore and discover some key properties of StatefulSets in practice:
Unique ordinal index
The three pods from the StatefulSet have names with random strings attached, but their identity is stable and defined. In contrast, pods created by a Deployment are truly ephemeral, with random strings in their names.
If I delete one pod from the StatefulSet and one from the Deployment, a new pod will be created for the Deployment, but for the StatefulSet, a pod with the same name and the same PVC will be created, ensuring persistence.
Stable Network Identity
To access Nginx, whether deployed as a Deployment or a StatefulSet, we need a service.
For the Deployment, we create a service of type ClusterIP, which will load balance traffic across multiple pods.
deployment-svc.yamlapiVersion: v1 kind: Service metadata: name: nginx-service-d labels: app: nginx-deploy spec: selector: app: nginx-deploy # Matches the labels of the Nginx Pods in the Deployment ports: - protocol: TCP port: 80 # Port on the Service targetPort: 80 # Port on the Nginx Pods type: ClusterIP
We can access the nginx deployment using the cluster IP as should in image and the name for service as a DNS address.
Let's try to connect to it. I will be using an image I built that contains all the necessary networking tools to curl the Nginx Deployment's service. Check this blog to learn more
kubectl run busybox --image=amit0myth/networking-toolset:v2 --restart=Never --rm -it -- sh
As we can see, I can curl the Nginx Deployment using its IP, the service name, and the Fully Qualified Domain Name (FQDN), which is resolved using the CoreDNS service.
In the case of a StatefulSet, we create a headless service.
headless-svc.yamlapiVersion: v1 kind: Service metadata: name: nginx-service-hl labels: app: nginx-service-hl spec: clusterIP: None # this will result in a headless service selector: app: nginx-stateful ports: - protocol: TCP port: 80 targetPort: 80
service nginx-service-hl has no IP
The service name resolves to three IPs, each assigned to a pod. We can either call the service, and traffic will be load-balanced across the pods, or we can directly call individual pods using their respective pod names.
Scaling
The StatefulSet controller creates each pod sequentially based on its ordinal index, ensuring that each pod's predecessor is Running and Ready before launching the next pod.
kubectl scale sts nginx-stateful --replicas=5
When scaling up a StatefulSet, the control plane creates new pods in sequential order, starting with the next available ordinal index, and ensures that each new pod is Running and Ready before creating the next one.
# This will perform instant downscale, without taking into account the ordinal index sequence. kubectl scale sts nginx-stateful --replicas=2 => instant scale down
kubectl patch sts nginx-stateful -p '{"spec":{"replicas":2}}
The control plane deletes one pod at a time, in reverse order of its ordinal index, and waits for each pod to be fully shut down before deleting the next one.
Use cases
- Databases
- StatefulSets are ideal for managing databases that need consistent data storage and predictable scaling.
- Distributed Databases: StatefulSets help maintain the topology of distributed systems like etcd and CockroachDB.
- Distributed Message Queues
- StatefulSets are often used to deploy messaging systems that rely on state and partitioning for scalability.
- Kafka: Brokers need stable identities for partitions and replication.
- StatefulSets are often used to deploy messaging systems that rely on state and partitioning for scalability.
- Search and Indexing Engines
- For applications that manage large-scale indexing and require state.
- Elasticsearch: StatefulSets provide stable network identities for nodes in a cluster.
- For applications that manage large-scale indexing and require state.
- Persistent Logging and Monitoring Systems
- Systems that require stateful components to store logs or metrics.
- Prometheus with Long-Term Storage: Prometheus may use StatefulSets for storage backends like Thanos or Cortex.
- Systems that require stateful components to store logs or metrics.
- Streaming Applications
- Applications that handle real-time data streams.
- Apache Flink: Stateful applications with distributed checkpoints.
- Applications that handle real-time data streams.
- Stateful Workloads with Leader Election
- Applications that require a leader election process.
- etcd: Used as a distributed key-value store for configurations.
- Zookeeper: Required for managing distributed systems.
- Applications that require a leader election process.
- Stateful Microservices
- Microservices with state dependencies.
- Payment Gateways: Stateful services that manage transactions and need persistence.
- Order Processing Systems: Requires unique states for each service instance.
- Microservices with state dependencies.
- Batch Processing
- StatefulSets are used in data processing jobs that require checkpointing.
- Hadoop: StatefulSets can manage Hadoop clusters with persistent storage for HDFS.
- StatefulSets are used in data processing jobs that require checkpointing.
- Stateful API Gateways
- Gateways that maintain sessions or persistent cache.
- API Rate Limiters: StatefulSets ensure state persistence for throttling.
- Edge Proxies with Cache: Proxies like Varnish or Squid may use persistent storage.
- Gateways that maintain sessions or persistent cache.
Note
- StatefulSets are not ideal for high-churn workloads where Pods are frequently created and destroyed.