有状态应用
概述
有状态应用(Stateful Applications)是指需要维护持久化状态、具有稳定网络标识、需要有序部署和扩展的应用程序。与无状态应用不同,有状态应用对存储、网络和部署顺序有特殊要求,常见的有状态应用包括数据库、缓存集群、消息队列等。
核心概念
有状态应用特性
- 持久化存储:数据需要持久化保存,Pod重启后数据不丢失
- 稳定网络标识:每个Pod有固定的主机名和网络标识
- 有序部署:Pod按顺序启动和停止(0, 1, 2...)
- 有序扩展:扩容和缩容按顺序进行
- 状态依赖:Pod之间可能存在依赖关系
StatefulSet vs Deployment
| 特性 | StatefulSet | Deployment |
|---|---|---|
| Pod标识 | 固定有序(web-0, web-1) | 随机生成 |
| 网络标识 | 稳定DNS名称 | 不稳定 |
| 存储 | 每个Pod独立PVC | 共享PVC或无状态 |
| 启动顺序 | 有序启动 | 并行启动 |
| 扩缩容 | 有序扩缩容 | 并行扩缩容 |
| 更新策略 | 滚动更新(有序) | 滚动更新(并行) |
典型有状态应用
- 数据库:MySQL、PostgreSQL、MongoDB
- 缓存系统:Redis Cluster、Memcached
- 消息队列:Kafka、RabbitMQ、ActiveMQ
- 分布式存储:Ceph、GlusterFS、MinIO
- 搜索引擎:Elasticsearch、Solr
StatefulSet详解
完整YAML配置示例
1. 基础StatefulSet
yaml
apiVersion: v1
kind: Service
metadata:
name: web-headless
namespace: default
spec:
type: ClusterIP
clusterIP: None
selector:
app: web
ports:
- port: 80
targetPort: 80
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
namespace: default
spec:
serviceName: web-headless
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 1Gi2. MySQL主从集群
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql-config
namespace: database
data:
master.cnf: |
[mysqld]
log-bin=mysql-bin
server-id=1
binlog-format=ROW
expire_logs_days=7
max_binlog_size=100M
slave.cnf: |
[mysqld]
server-id=2
relay-log=relay-bin
read_only=1
---
apiVersion: v1
kind: Secret
metadata:
name: mysql-secret
namespace: database
type: Opaque
stringData:
root-password: "MySQLRootPassword123!"
replication-user: "repl"
replication-password: "ReplPassword456!"
---
apiVersion: v1
kind: Service
metadata:
name: mysql-headless
namespace: database
spec:
type: ClusterIP
clusterIP: None
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
---
apiVersion: v1
kind: Service
metadata:
name: mysql
namespace: database
spec:
type: ClusterIP
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: database
spec:
serviceName: mysql-headless
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
initContainers:
- name: init-mysql
image: mysql:8.0
command:
- bash
- "-c"
- |
set -ex
[[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
echo [mysqld] > /mnt/conf.d/server-id.cnf
echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
if [[ $ordinal -eq 0 ]]; then
cp /mnt/config-map/master.cnf /mnt/conf.d/
else
cp /mnt/config-map/slave.cnf /mnt/conf.d/
fi
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
- name: config-map
mountPath: /mnt/config-map
- name: clone-mysql
image: gcr.io/google-samples/xtrabackup:1.0
command:
- bash
- "-c"
- |
set -ex
[[ -d /var/lib/mysql/mysql ]] && exit 0
[[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
[[ $ordinal -eq 0 ]] && exit 0
ncat --recv-only mysql-$(($ordinal-1)).mysql-headless 3307 | xbstream -x -C /var/lib/mysql
xtrabackup --prepare --target-dir=/var/lib/mysql
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
livenessProbe:
exec:
command: ["mysqladmin", "ping"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
initialDelaySeconds: 5
periodSeconds: 2
- name: xtrabackup
image: gcr.io/google-samples/xtrabackup:1.0
ports:
- containerPort: 3307
name: xtrabackup
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
command:
- bash
- "-c"
- |
set -ex
cd /var/lib/mysql
if [[ -f xtrabackup_slave_info ]]; then
mv xtrabackup_slave_info change_master_to.sql.in
rm -f xtrabackup_binlog_info
elif [[ -f xtrabackup_binlog_info ]]; then
[[ $(cat xtrabackup_binlog_info) =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
rm xtrabackup_binlog_info
echo "CHANGE MASTER TO MASTER_LOG_FILE='${BASH_REMATCH[1]}',\
MASTER_LOG_POS=${BASH_REMATCH[2]}" > change_master_to.sql.in
fi
if [[ -f change_master_to.sql.in ]]; then
echo "Waiting for mysqld to be ready (accepting connections)"
until mysql -h 127.0.0.1 -e "SELECT 1"; do sleep 1; done
echo "Initializing replication from clone position"
mv change_master_to.sql.in change_master_to.sql.orig
mysql -h 127.0.0.1 <<EOF
$(<change_master_to.sql.orig),
MASTER_HOST='mysql-0.mysql-headless',
MASTER_USER='root',
MASTER_PASSWORD='',
MASTER_CONNECT_RETRY=10;
START SLAVE;
EOF
fi
exec ncat --listen --keep-open --send-only --max-conns=1 3307 -c \
"xtrabackup --backup --slave-info --stream=xbstream --host=127.0.0.1 --user=root"
volumes:
- name: conf
emptyDir: {}
- name: config-map
configMap:
name: mysql-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast-ssd
resources:
requests:
storage: 50Gi3. Redis Cluster
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
namespace: cache
data:
redis.conf: |
bind 0.0.0.0
port 6379
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
daemonize no
protected-mode no
---
apiVersion: v1
kind: Secret
metadata:
name: redis-secret
namespace: cache
type: Opaque
stringData:
redis-password: "RedisClusterPassword123!"
---
apiVersion: v1
kind: Service
metadata:
name: redis-headless
namespace: cache
spec:
type: ClusterIP
clusterIP: None
selector:
app: redis
ports:
- port: 6379
targetPort: 6379
name: redis
- port: 16379
targetPort: 16379
name: cluster
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: cache
spec:
serviceName: redis-headless
replicas: 6
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:6.2
command:
- redis-server
- /etc/redis/redis.conf
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-secret
key: redis-password
ports:
- containerPort: 6379
name: redis
- containerPort: 16379
name: cluster
volumeMounts:
- name: redis-data
mountPath: /data
- name: config
mountPath: /etc/redis
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
livenessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: config
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: redis-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 10Gi4. MongoDB副本集
yaml
apiVersion: v1
kind: Secret
metadata:
name: mongodb-secret
namespace: database
type: Opaque
stringData:
mongo-root-username: "admin"
mongo-root-password: "MongoDBPassword123!"
mongo-replica-set-key: "ReplicaSetKey456!"
---
apiVersion: v1
kind: Service
metadata:
name: mongodb-headless
namespace: database
spec:
type: ClusterIP
clusterIP: None
selector:
app: mongodb
ports:
- port: 27017
targetPort: 27017
---
apiVersion: v1
kind: Service
metadata:
name: mongodb
namespace: database
spec:
type: ClusterIP
selector:
app: mongodb
ports:
- port: 27017
targetPort: 27017
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongodb
namespace: database
spec:
serviceName: mongodb-headless
replicas: 3
selector:
matchLabels:
app: mongodb
template:
metadata:
labels:
app: mongodb
spec:
containers:
- name: mongodb
image: mongo:5.0
command:
- mongod
- "--replSet"
- rs0
- "--bind_ip_all"
env:
- name: MONGO_INITDB_ROOT_USERNAME
valueFrom:
secretKeyRef:
name: mongodb-secret
key: mongo-root-username
- name: MONGO_INITDB_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mongodb-secret
key: mongo-root-password
ports:
- containerPort: 27017
volumeMounts:
- name: mongodb-data
mountPath: /data/db
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
livenessProbe:
exec:
command:
- mongo
- --eval
- "db.adminCommand('ping')"
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- mongo
- --eval
- "db.adminCommand('ping')"
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: mongodb-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast-ssd
resources:
requests:
storage: 50Gikubectl操作命令
StatefulSet管理命令
bash
# 查看StatefulSet
kubectl get statefulset
kubectl get sts
# 查看StatefulSet详细信息
kubectl describe sts web
# 查看StatefulSet的Pod
kubectl get pods -l app=web
# 扩容StatefulSet
kubectl scale sts web --replicas=5
# 缩容StatefulSet
kubectl scale sts web --replicas=3
# 删除StatefulSet(保留PVC)
kubectl delete sts web
# 删除StatefulSet和PVC
kubectl delete sts web
kubectl delete pvc -l app=web
# 查看StatefulSet历史
kubectl rollout history sts/web
# 回滚StatefulSet
kubectl rollout undo sts/web
# 暂停StatefulSet更新
kubectl rollout pause sts/web
# 恢复StatefulSet更新
kubectl rollout resume sts/web有状态应用操作命令
bash
# 查看Pod序号
kubectl get pods -l app=mysql -o custom-columns=NAME:.metadata.name,ORDINAL:.metadata.labels.controller\.kubernetes\.io/instance
# 查看Pod DNS
kubectl run -it --rm debug --image=busybox -- nslookup mysql-0.mysql-headless.database.svc.cluster.local
# 连接到特定Pod
kubectl exec -it mysql-0 -- mysql -u root -p
# 查看PVC绑定
kubectl get pvc -l app=mysql
# 查看存储使用情况
kubectl exec -it mysql-0 -- df -h /var/lib/mysql
# 备份数据库
kubectl exec mysql-0 -- mysqldump -u root -p${MYSQL_ROOT_PASSWORD} --all-databases > backup.sql
# 恢复数据库
kubectl exec -i mysql-0 -- mysql -u root -p${MYSQL_ROOT_PASSWORD} < backup.sql故障排查命令
bash
# 查看Pod状态
kubectl get pods -l app=mysql -o wide
# 查看Pod事件
kubectl describe pod mysql-0
# 查看Pod日志
kubectl logs mysql-0 -c mysql
# 查看多个容器日志
kubectl logs mysql-0 -c mysql
kubectl logs mysql-0 -c xtrabackup
# 进入Pod调试
kubectl exec -it mysql-0 -- /bin/bash
# 检查网络连接
kubectl exec -it mysql-0 -- ping mysql-1.mysql-headless
# 检查服务发现
kubectl exec -it mysql-0 -- nslookup mysql-headless
# 查看PVC状态
kubectl describe pvc data-mysql-0
# 查看PV状态
kubectl describe pv pvc-xxx真实场景实践示例
场景1:MySQL高可用集群部署
需求:部署MySQL主从复制集群,支持读写分离和故障转移。
yaml
# 1. 创建命名空间
apiVersion: v1
kind: Namespace
metadata:
name: mysql-cluster
---
# 2. 创建配置
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql-config
namespace: mysql-cluster
data:
master.cnf: |
[mysqld]
log-bin=mysql-bin
server-id=1
binlog-format=ROW
binlog-cache-size=1M
max-binlog-size=500M
expire-logs-days=7
innodb-buffer-pool-size=2G
innodb-log-file-size=256M
max-connections=500
slow-query-log=1
slow-query-log-file=/var/log/mysql/slow.log
long-query-time=2
slave.cnf: |
[mysqld]
server-id=2
relay-log=relay-bin
read-only=1
relay-log-recovery=1
slave-parallel-workers=4
innodb-buffer-pool-size=2G
max-connections=500
---
# 3. 创建密钥
apiVersion: v1
kind: Secret
metadata:
name: mysql-secret
namespace: mysql-cluster
type: Opaque
stringData:
root-password: "MySQLRootPassword123!"
replication-user: "repl_user"
replication-password: "ReplPassword456!"
monitor-user: "monitor"
monitor-password: "MonitorPass789!"
---
# 4. 创建服务
apiVersion: v1
kind: Service
metadata:
name: mysql-master
namespace: mysql-cluster
spec:
type: ClusterIP
selector:
app: mysql
role: master
ports:
- port: 3306
targetPort: 3306
---
apiVersion: v1
kind: Service
metadata:
name: mysql-read
namespace: mysql-cluster
spec:
type: ClusterIP
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
---
# 5. 创建MySQL主节点
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql-master
namespace: mysql-cluster
spec:
serviceName: mysql-headless
replicas: 1
selector:
matchLabels:
app: mysql
role: master
template:
metadata:
labels:
app: mysql
role: master
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
- name: config
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 4000m
memory: 4Gi
livenessProbe:
exec:
command: ["mysqladmin", "ping", "-h", "localhost"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
initialDelaySeconds: 5
periodSeconds: 2
volumes:
- name: config
configMap:
name: mysql-config
items:
- key: master.cnf
path: mysql.cnf
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
---
# 6. 创建MySQL从节点
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql-slave
namespace: mysql-cluster
spec:
serviceName: mysql-headless
replicas: 2
selector:
matchLabels:
app: mysql
role: slave
template:
metadata:
labels:
app: mysql
role: slave
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
- name: config
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 4000m
memory: 4Gi
livenessProbe:
exec:
command: ["mysqladmin", "ping", "-h", "localhost"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
initialDelaySeconds: 5
periodSeconds: 2
volumes:
- name: config
configMap:
name: mysql-config
items:
- key: slave.cnf
path: mysql.cnf
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi初始化脚本:
bash
#!/bin/bash
# init-mysql-replication.sh
# 在主节点创建复制用户
kubectl exec -n mysql-cluster mysql-master-0 -- mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "
CREATE USER 'repl_user'@'%' IDENTIFIED BY 'ReplPassword456!';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'%';
FLUSH PRIVILEGES;
"
# 获取主节点binlog位置
MASTER_STATUS=$(kubectl exec -n mysql-cluster mysql-master-0 -- mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW MASTER STATUS\G" | grep -E "File|Position")
MASTER_LOG_FILE=$(echo "$MASTER_STATUS" | grep "File:" | awk '{print $2}')
MASTER_LOG_POS=$(echo "$MASTER_STATUS" | grep "Position:" | awk '{print $2}')
# 配置从节点
for i in 0 1; do
kubectl exec -n mysql-cluster mysql-slave-$i -- mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "
STOP SLAVE;
CHANGE MASTER TO
MASTER_HOST='mysql-master-0.mysql-headless.mysql-cluster.svc.cluster.local',
MASTER_USER='repl_user',
MASTER_PASSWORD='ReplPassword456!',
MASTER_LOG_FILE='${MASTER_LOG_FILE}',
MASTER_LOG_POS=${MASTER_LOG_POS};
START SLAVE;
"
done
echo "MySQL replication configured successfully"场景2:Redis Sentinel高可用集群
需求:部署Redis Sentinel集群,实现Redis主从自动故障转移。
yaml
# 1. Redis配置
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
namespace: redis-sentinel
data:
redis.conf: |
bind 0.0.0.0
port 6379
daemonize no
appendonly yes
appendfsync everysec
save 900 1
save 300 10
save 60 10000
maxmemory 2gb
maxmemory-policy allkeys-lru
sentinel.conf: |
sentinel monitor mymaster redis-0.redis-headless 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
---
# 2. Redis密钥
apiVersion: v1
kind: Secret
metadata:
name: redis-secret
namespace: redis-sentinel
type: Opaque
stringData:
redis-password: "RedisSentinelPassword123!"
---
# 3. Redis服务
apiVersion: v1
kind: Service
metadata:
name: redis-headless
namespace: redis-sentinel
spec:
type: ClusterIP
clusterIP: None
selector:
app: redis
ports:
- port: 6379
targetPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: redis-sentinel
spec:
type: ClusterIP
selector:
app: redis
ports:
- port: 6379
targetPort: 6379
---
# 4. Redis StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: redis-sentinel
spec:
serviceName: redis-headless
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:6.2
command:
- redis-server
- /etc/redis/redis.conf
ports:
- containerPort: 6379
volumeMounts:
- name: redis-data
mountPath: /data
- name: config
mountPath: /etc/redis
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
livenessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 5
periodSeconds: 5
- name: sentinel
image: redis:6.2
command:
- redis-sentinel
- /etc/redis/sentinel.conf
ports:
- containerPort: 26379
volumeMounts:
- name: config
mountPath: /etc/redis
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
volumes:
- name: config
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: redis-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 20Gi场景3:Elasticsearch集群部署
需求:部署Elasticsearch集群,支持数据分片和副本。
yaml
# 1. Elasticsearch配置
apiVersion: v1
kind: ConfigMap
metadata:
name: elasticsearch-config
namespace: logging
data:
elasticsearch.yml: |
cluster.name: k8s-logs
node.name: ${HOSTNAME}
network.host: 0.0.0.0
discovery.seed_hosts: ["elasticsearch-0.elasticsearch-headless", "elasticsearch-1.elasticsearch-headless", "elasticsearch-2.elasticsearch-headless"]
cluster.initial_master_nodes: ["elasticsearch-0", "elasticsearch-1", "elasticsearch-2"]
node.master: true
node.data: true
node.ingest: true
xpack.security.enabled: false
xpack.monitoring.enabled: true
xpack.watcher.enabled: false
path.data: /usr/share/elasticsearch/data
path.logs: /usr/share/elasticsearch/logs
bootstrap.memory_lock: false
---
# 2. Elasticsearch服务
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-headless
namespace: logging
spec:
type: ClusterIP
clusterIP: None
selector:
app: elasticsearch
ports:
- port: 9200
name: http
- port: 9300
name: transport
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch
namespace: logging
spec:
type: ClusterIP
selector:
app: elasticsearch
ports:
- port: 9200
targetPort: 9200
name: http
---
# 3. Elasticsearch StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
namespace: logging
spec:
serviceName: elasticsearch-headless
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.0
env:
- name: ES_JAVA_OPTS
value: "-Xms2g -Xmx2g"
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
- name: config
mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
subPath: elasticsearch.yml
resources:
requests:
cpu: 1000m
memory: 4Gi
limits:
cpu: 4000m
memory: 8Gi
livenessProbe:
httpGet:
path: /_cluster/health
port: 9200
initialDelaySeconds: 60
periodSeconds: 10
readinessProbe:
httpGet:
path: /_cluster/health?local=true
port: 9200
initialDelaySeconds: 30
periodSeconds: 5
volumes:
- name: config
configMap:
name: elasticsearch-config
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi场景4:数据备份与恢复
需求:为MySQL数据库实现自动化备份和恢复。
yaml
# 1. 备份脚本ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: backup-script
namespace: database
data:
backup.sh: |
#!/bin/bash
set -e
BACKUP_DIR="/backups"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_DIR}/mysql_backup_${DATE}.sql.gz"
# 执行备份
mysqldump -h ${MYSQL_HOST} -u ${MYSQL_USER} -p${MYSQL_PASSWORD} \
--all-databases \
--single-transaction \
--routines \
--triggers \
--events \
--master-data=2 \
--flush-logs | gzip > ${BACKUP_FILE}
# 上传到对象存储
if [ "${UPLOAD_TO_S3}" = "true" ]; then
aws s3 cp ${BACKUP_FILE} s3://${S3_BUCKET}/mysql-backups/
fi
# 保留最近7天的备份
find ${BACKUP_DIR} -name "mysql_backup_*.sql.gz" -mtime +7 -delete
echo "Backup completed: ${BACKUP_FILE}"
restore.sh: |
#!/bin/bash
set -e
BACKUP_FILE=$1
if [ -z "$BACKUP_FILE" ]; then
echo "Usage: restore.sh <backup-file>"
exit 1
fi
# 从对象存储下载
if [[ "$BACKUP_FILE" =~ ^s3:// ]]; then
aws s3 cp ${BACKUP_FILE} /tmp/backup.sql.gz
BACKUP_FILE=/tmp/backup.sql.gz
fi
# 执行恢复
gunzip < ${BACKUP_FILE} | mysql -h ${MYSQL_HOST} -u ${MYSQL_USER} -p${MYSQL_PASSWORD}
echo "Restore completed successfully"
---
# 2. 备份CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: mysql-backup
namespace: database
spec:
schedule: "0 2 * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: mysql:8.0
command:
- /bin/bash
- /scripts/backup.sh
env:
- name: MYSQL_HOST
value: "mysql-0.mysql-headless.database.svc.cluster.local"
- name: MYSQL_USER
value: "root"
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
- name: UPLOAD_TO_S3
value: "true"
- name: S3_BUCKET
value: "my-backup-bucket"
volumeMounts:
- name: backup-storage
mountPath: /backups
- name: scripts
mountPath: /scripts
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-pvc
- name: scripts
configMap:
name: backup-script
defaultMode: 0755
restartPolicy: OnFailure
---
# 3. 备份存储PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: backup-pvc
namespace: database
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 200Gi故障排查指南
常见问题及解决方案
1. StatefulSet Pod启动失败
症状:
bash
$ kubectl get pods -l app=mysql
NAME READY STATUS RESTARTS AGE
mysql-0 0/1 CrashLoopBackOff 3 5m
mysql-1 0/1 ContainerCreating 0 2m排查步骤:
bash
# 1. 查看Pod事件
kubectl describe pod mysql-0
# 2. 查看Pod日志
kubectl logs mysql-0
# 3. 检查PVC状态
kubectl get pvc data-mysql-0
# 4. 检查存储类
kubectl get storageclass
# 5. 检查InitContainer日志
kubectl logs mysql-0 -c init-mysql可能原因:
- PVC未绑定
- 存储类不存在
- InitContainer失败
- 配置错误
- 资源不足
解决方案:
bash
# 检查PVC状态
kubectl describe pvc data-mysql-0
# 检查存储后端
kubectl logs -n kube-system csi-provisioner-xxx
# 重新创建Pod
kubectl delete pod mysql-02. 数据同步失败
症状:MySQL主从复制中断
排查步骤:
bash
# 1. 检查主节点状态
kubectl exec -it mysql-0 -- mysql -u root -p -e "SHOW MASTER STATUS\G"
# 2. 检查从节点状态
kubectl exec -it mysql-1 -- mysql -u root -p -e "SHOW SLAVE STATUS\G"
# 3. 检查网络连接
kubectl exec -it mysql-1 -- ping mysql-0.mysql-headless
# 4. 查看错误日志
kubectl exec -it mysql-1 -- cat /var/log/mysql/error.log解决方案:
bash
# 重新配置复制
kubectl exec -it mysql-1 -- mysql -u root -p -e "
STOP SLAVE;
CHANGE MASTER TO
MASTER_HOST='mysql-0.mysql-headless',
MASTER_USER='repl_user',
MASTER_PASSWORD='ReplPassword456!',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=0;
START SLAVE;
"3. 存储空间不足
症状:
Error: no space left on device排查步骤:
bash
# 1. 检查PVC使用情况
kubectl exec -it mysql-0 -- df -h
# 2. 检查数据库大小
kubectl exec -it mysql-0 -- mysql -u root -p -e "SELECT table_schema, SUM(data_length + index_length) / 1024 / 1024 AS Size_MB FROM information_schema.tables GROUP BY table_schema;"
# 3. 检查PVC容量
kubectl get pvc data-mysql-0 -o yaml解决方案:
bash
# 扩容PVC
kubectl patch pvc data-mysql-0 -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'
# 清理旧数据
kubectl exec -it mysql-0 -- mysql -u root -p -e "PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 7 DAY);"4. 网络分区问题
症状:Pod之间无法通信
排查步骤:
bash
# 1. 检查DNS解析
kubectl exec -it mysql-0 -- nslookup mysql-1.mysql-headless
# 2. 检查网络连接
kubectl exec -it mysql-0 -- ping mysql-1.mysql-headless
# 3. 检查端口
kubectl exec -it mysql-0 -- telnet mysql-1.mysql-headless 3306
# 4. 检查NetworkPolicy
kubectl get networkpolicy -n database解决方案:
bash
# 检查NetworkPolicy配置
kubectl describe networkpolicy -n database
# 临时禁用NetworkPolicy
kubectl delete networkpolicy -n database --all5. StatefulSet更新卡住
症状:
bash
$ kubectl rollout status sts/mysql
Waiting for 1 pods to be ready...排查步骤:
bash
# 1. 查看更新状态
kubectl describe sts mysql
# 2. 查看Pod状态
kubectl get pods -l app=mysql
# 3. 查看Pod事件
kubectl describe pod mysql-1
# 4. 检查更新策略
kubectl get sts mysql -o yaml | grep -A 5 updateStrategy解决方案:
bash
# 暂停更新
kubectl rollout pause sts/mysql
# 手动删除问题Pod
kubectl delete pod mysql-1
# 恢复更新
kubectl rollout resume sts/mysql故障排查流程图
StatefulSet问题
↓
检查Pod状态 → Pending → 检查PVC/存储
↓ Running
检查应用日志 → 错误 → 分析错误原因
↓ 正常
检查网络连接 → 失败 → 检查DNS/NetworkPolicy
↓ 正常
检查数据同步 → 失败 → 重新配置同步
↓ 正常
检查存储空间 → 不足 → 扩容或清理
↓ 正常
应用正常运行最佳实践建议
1. 存储配置
yaml
# 使用高性能存储类
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: database-storage
provisioner: kubernetes.io/aws-ebs
parameters:
type: io1
iopsPerGB: "100"
fsType: xfs
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
# StatefulSet使用高性能存储
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: database-storage
resources:
requests:
storage: 100Gi2. 资源配置
yaml
# 合理设置资源限制
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 4000m
memory: 4Gi
---
# 设置PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: mysql-pdb
namespace: database
spec:
minAvailable: 2
selector:
matchLabels:
app: mysql3. 更新策略
yaml
# 使用RollingUpdate策略
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0 # 从第0个Pod开始更新
podManagementPolicy: OrderedReady # 有序管理4. 监控告警
yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: statefulset-alerts
namespace: monitoring
spec:
groups:
- name: statefulset
rules:
- alert: StatefulSetReplicasMismatch
expr: |
kube_statefulset_status_replicas_ready != kube_statefulset_status_replicas
for: 5m
labels:
severity: warning
annotations:
summary: "StatefulSet {{ $labels.statefulset }} replicas mismatch"
description: "StatefulSet {{ $labels.statefulset }} in namespace {{ $labels.namespace }} has {{ $value }} ready replicas"
- alert: StatefulSetDown
expr: |
kube_statefulset_status_replicas_ready == 0
for: 1m
labels:
severity: critical
annotations:
summary: "StatefulSet {{ $labels.statefulset }} is down"
description: "StatefulSet {{ $labels.statefulset }} in namespace {{ $labels.namespace }} has no ready replicas"5. 备份策略
yaml
# 定期备份
apiVersion: batch/v1
kind: CronJob
metadata:
name: database-backup
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: mysql:8.0
command: ["/scripts/backup.sh"]
volumeMounts:
- name: backup-storage
mountPath: /backups
- name: scripts
mountPath: /scripts
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-pvc
- name: scripts
configMap:
name: backup-scripts
restartPolicy: OnFailure6. 安全配置
yaml
# 使用SecurityContext
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
template:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 999
fsGroup: 999
containers:
- name: mysql
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL7. 网络配置
yaml
# 使用Headless Service
apiVersion: v1
kind: Service
metadata:
name: mysql-headless
spec:
type: ClusterIP
clusterIP: None
selector:
app: mysql
ports:
- port: 3306
---
# 配置NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: mysql-network-policy
spec:
podSelector:
matchLabels:
app: mysql
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: app
ports:
- protocol: TCP
port: 3306
egress:
- to:
- podSelector:
matchLabels:
app: mysql
ports:
- protocol: TCP
port: 3306总结
核心要点
- StatefulSet特性:固定Pod标识、稳定网络标识、有序部署和扩展
- 存储管理:每个Pod独立的PVC,数据持久化存储
- 网络标识:通过Headless Service提供稳定的DNS名称
- 更新策略:有序滚动更新,避免数据不一致
- 备份恢复:定期备份,支持快速恢复
关键命令速查
bash
# StatefulSet管理
kubectl get sts # 查看StatefulSet
kubectl describe sts <name> # 查看详情
kubectl scale sts <name> --replicas=5 # 扩缩容
kubectl rollout status sts/<name> # 查看更新状态
kubectl rollout undo sts/<name> # 回滚
# Pod管理
kubectl get pods -l app=<name> # 查看Pod
kubectl exec -it <pod> -- /bin/bash # 进入Pod
kubectl logs <pod> # 查看日志
kubectl delete pod <pod> # 删除Pod
# 存储管理
kubectl get pvc # 查看PVC
kubectl describe pvc <name> # 查看PVC详情
kubectl patch pvc <name> -p '{...}' # 扩容PVC下一步学习
- 17-PersistentVolume - 深入学习持久化存储
- 18-StorageClass - 学习动态存储供应
- 19-ConfigMap与Secret - 掌握配置管理
- 08-StatefulSet - 回顾StatefulSet基础知识