Skip to content

有状态应用

概述

有状态应用(Stateful Applications)是指需要维护持久化状态、具有稳定网络标识、需要有序部署和扩展的应用程序。与无状态应用不同,有状态应用对存储、网络和部署顺序有特殊要求,常见的有状态应用包括数据库、缓存集群、消息队列等。

核心概念

有状态应用特性

  • 持久化存储:数据需要持久化保存,Pod重启后数据不丢失
  • 稳定网络标识:每个Pod有固定的主机名和网络标识
  • 有序部署:Pod按顺序启动和停止(0, 1, 2...)
  • 有序扩展:扩容和缩容按顺序进行
  • 状态依赖:Pod之间可能存在依赖关系

StatefulSet vs Deployment

特性StatefulSetDeployment
Pod标识固定有序(web-0, web-1)随机生成
网络标识稳定DNS名称不稳定
存储每个Pod独立PVC共享PVC或无状态
启动顺序有序启动并行启动
扩缩容有序扩缩容并行扩缩容
更新策略滚动更新(有序)滚动更新(并行)

典型有状态应用

  • 数据库:MySQL、PostgreSQL、MongoDB
  • 缓存系统:Redis Cluster、Memcached
  • 消息队列:Kafka、RabbitMQ、ActiveMQ
  • 分布式存储:Ceph、GlusterFS、MinIO
  • 搜索引擎:Elasticsearch、Solr

StatefulSet详解

完整YAML配置示例

1. 基础StatefulSet

yaml
apiVersion: v1
kind: Service
metadata:
  name: web-headless
  namespace: default
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: web
  ports:
  - port: 80
    targetPort: 80
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
  namespace: default
spec:
  serviceName: web-headless
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: standard
      resources:
        requests:
          storage: 1Gi

2. MySQL主从集群

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql-config
  namespace: database
data:
  master.cnf: |
    [mysqld]
    log-bin=mysql-bin
    server-id=1
    binlog-format=ROW
    expire_logs_days=7
    max_binlog_size=100M
  slave.cnf: |
    [mysqld]
    server-id=2
    relay-log=relay-bin
    read_only=1
---
apiVersion: v1
kind: Secret
metadata:
  name: mysql-secret
  namespace: database
type: Opaque
stringData:
  root-password: "MySQLRootPassword123!"
  replication-user: "repl"
  replication-password: "ReplPassword456!"
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
  namespace: database
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
---
apiVersion: v1
kind: Service
metadata:
  name: mysql
  namespace: database
spec:
  type: ClusterIP
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: database
spec:
  serviceName: mysql-headless
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      initContainers:
      - name: init-mysql
        image: mysql:8.0
        command:
        - bash
        - "-c"
        - |
          set -ex
          [[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
          ordinal=${BASH_REMATCH[1]}
          echo [mysqld] > /mnt/conf.d/server-id.cnf
          echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
          if [[ $ordinal -eq 0 ]]; then
            cp /mnt/config-map/master.cnf /mnt/conf.d/
          else
            cp /mnt/config-map/slave.cnf /mnt/conf.d/
          fi
        volumeMounts:
        - name: conf
          mountPath: /mnt/conf.d
        - name: config-map
          mountPath: /mnt/config-map
      - name: clone-mysql
        image: gcr.io/google-samples/xtrabackup:1.0
        command:
        - bash
        - "-c"
        - |
          set -ex
          [[ -d /var/lib/mysql/mysql ]] && exit 0
          [[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
          ordinal=${BASH_REMATCH[1]}
          [[ $ordinal -eq 0 ]] && exit 0
          ncat --recv-only mysql-$(($ordinal-1)).mysql-headless 3307 | xbstream -x -C /var/lib/mysql
          xtrabackup --prepare --target-dir=/var/lib/mysql
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
          subPath: mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
          subPath: mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 2Gi
        livenessProbe:
          exec:
            command: ["mysqladmin", "ping"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
          initialDelaySeconds: 5
          periodSeconds: 2
      - name: xtrabackup
        image: gcr.io/google-samples/xtrabackup:1.0
        ports:
        - containerPort: 3307
          name: xtrabackup
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
          subPath: mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
        command:
        - bash
        - "-c"
        - |
          set -ex
          cd /var/lib/mysql
          if [[ -f xtrabackup_slave_info ]]; then
            mv xtrabackup_slave_info change_master_to.sql.in
            rm -f xtrabackup_binlog_info
          elif [[ -f xtrabackup_binlog_info ]]; then
            [[ $(cat xtrabackup_binlog_info) =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
            rm xtrabackup_binlog_info
            echo "CHANGE MASTER TO MASTER_LOG_FILE='${BASH_REMATCH[1]}',\
                  MASTER_LOG_POS=${BASH_REMATCH[2]}" > change_master_to.sql.in
          fi
          if [[ -f change_master_to.sql.in ]]; then
            echo "Waiting for mysqld to be ready (accepting connections)"
            until mysql -h 127.0.0.1 -e "SELECT 1"; do sleep 1; done
            echo "Initializing replication from clone position"
            mv change_master_to.sql.in change_master_to.sql.orig
            mysql -h 127.0.0.1 <<EOF
          $(<change_master_to.sql.orig),
            MASTER_HOST='mysql-0.mysql-headless',
            MASTER_USER='root',
            MASTER_PASSWORD='',
            MASTER_CONNECT_RETRY=10;
          START SLAVE;
          EOF
          fi
          exec ncat --listen --keep-open --send-only --max-conns=1 3307 -c \
            "xtrabackup --backup --slave-info --stream=xbstream --host=127.0.0.1 --user=root"
      volumes:
      - name: conf
        emptyDir: {}
      - name: config-map
        configMap:
          name: mysql-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 50Gi

3. Redis Cluster

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config
  namespace: cache
data:
  redis.conf: |
    bind 0.0.0.0
    port 6379
    cluster-enabled yes
    cluster-config-file nodes.conf
    cluster-node-timeout 5000
    appendonly yes
    daemonize no
    protected-mode no
---
apiVersion: v1
kind: Secret
metadata:
  name: redis-secret
  namespace: cache
type: Opaque
stringData:
  redis-password: "RedisClusterPassword123!"
---
apiVersion: v1
kind: Service
metadata:
  name: redis-headless
  namespace: cache
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379
    name: redis
  - port: 16379
    targetPort: 16379
    name: cluster
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
  namespace: cache
spec:
  serviceName: redis-headless
  replicas: 6
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:6.2
        command:
        - redis-server
        - /etc/redis/redis.conf
        env:
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: redis-secret
              key: redis-password
        ports:
        - containerPort: 6379
          name: redis
        - containerPort: 16379
          name: cluster
        volumeMounts:
        - name: redis-data
          mountPath: /data
        - name: config
          mountPath: /etc/redis
        resources:
          requests:
            cpu: 200m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1Gi
        livenessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: config
        configMap:
          name: redis-config
  volumeClaimTemplates:
  - metadata:
      name: redis-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: standard
      resources:
        requests:
          storage: 10Gi

4. MongoDB副本集

yaml
apiVersion: v1
kind: Secret
metadata:
  name: mongodb-secret
  namespace: database
type: Opaque
stringData:
  mongo-root-username: "admin"
  mongo-root-password: "MongoDBPassword123!"
  mongo-replica-set-key: "ReplicaSetKey456!"
---
apiVersion: v1
kind: Service
metadata:
  name: mongodb-headless
  namespace: database
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: mongodb
  ports:
  - port: 27017
    targetPort: 27017
---
apiVersion: v1
kind: Service
metadata:
  name: mongodb
  namespace: database
spec:
  type: ClusterIP
  selector:
    app: mongodb
  ports:
  - port: 27017
    targetPort: 27017
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongodb
  namespace: database
spec:
  serviceName: mongodb-headless
  replicas: 3
  selector:
    matchLabels:
      app: mongodb
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      containers:
      - name: mongodb
        image: mongo:5.0
        command:
        - mongod
        - "--replSet"
        - rs0
        - "--bind_ip_all"
        env:
        - name: MONGO_INITDB_ROOT_USERNAME
          valueFrom:
            secretKeyRef:
              name: mongodb-secret
              key: mongo-root-username
        - name: MONGO_INITDB_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongodb-secret
              key: mongo-root-password
        ports:
        - containerPort: 27017
        volumeMounts:
        - name: mongodb-data
          mountPath: /data/db
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 2Gi
        livenessProbe:
          exec:
            command:
            - mongo
            - --eval
            - "db.adminCommand('ping')"
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - mongo
            - --eval
            - "db.adminCommand('ping')"
          initialDelaySeconds: 5
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: mongodb-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 50Gi

kubectl操作命令

StatefulSet管理命令

bash
# 查看StatefulSet
kubectl get statefulset
kubectl get sts

# 查看StatefulSet详细信息
kubectl describe sts web

# 查看StatefulSet的Pod
kubectl get pods -l app=web

# 扩容StatefulSet
kubectl scale sts web --replicas=5

# 缩容StatefulSet
kubectl scale sts web --replicas=3

# 删除StatefulSet(保留PVC)
kubectl delete sts web

# 删除StatefulSet和PVC
kubectl delete sts web
kubectl delete pvc -l app=web

# 查看StatefulSet历史
kubectl rollout history sts/web

# 回滚StatefulSet
kubectl rollout undo sts/web

# 暂停StatefulSet更新
kubectl rollout pause sts/web

# 恢复StatefulSet更新
kubectl rollout resume sts/web

有状态应用操作命令

bash
# 查看Pod序号
kubectl get pods -l app=mysql -o custom-columns=NAME:.metadata.name,ORDINAL:.metadata.labels.controller\.kubernetes\.io/instance

# 查看Pod DNS
kubectl run -it --rm debug --image=busybox -- nslookup mysql-0.mysql-headless.database.svc.cluster.local

# 连接到特定Pod
kubectl exec -it mysql-0 -- mysql -u root -p

# 查看PVC绑定
kubectl get pvc -l app=mysql

# 查看存储使用情况
kubectl exec -it mysql-0 -- df -h /var/lib/mysql

# 备份数据库
kubectl exec mysql-0 -- mysqldump -u root -p${MYSQL_ROOT_PASSWORD} --all-databases > backup.sql

# 恢复数据库
kubectl exec -i mysql-0 -- mysql -u root -p${MYSQL_ROOT_PASSWORD} < backup.sql

故障排查命令

bash
# 查看Pod状态
kubectl get pods -l app=mysql -o wide

# 查看Pod事件
kubectl describe pod mysql-0

# 查看Pod日志
kubectl logs mysql-0 -c mysql

# 查看多个容器日志
kubectl logs mysql-0 -c mysql
kubectl logs mysql-0 -c xtrabackup

# 进入Pod调试
kubectl exec -it mysql-0 -- /bin/bash

# 检查网络连接
kubectl exec -it mysql-0 -- ping mysql-1.mysql-headless

# 检查服务发现
kubectl exec -it mysql-0 -- nslookup mysql-headless

# 查看PVC状态
kubectl describe pvc data-mysql-0

# 查看PV状态
kubectl describe pv pvc-xxx

真实场景实践示例

场景1:MySQL高可用集群部署

需求:部署MySQL主从复制集群,支持读写分离和故障转移。

yaml
# 1. 创建命名空间
apiVersion: v1
kind: Namespace
metadata:
  name: mysql-cluster
---
# 2. 创建配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql-config
  namespace: mysql-cluster
data:
  master.cnf: |
    [mysqld]
    log-bin=mysql-bin
    server-id=1
    binlog-format=ROW
    binlog-cache-size=1M
    max-binlog-size=500M
    expire-logs-days=7
    innodb-buffer-pool-size=2G
    innodb-log-file-size=256M
    max-connections=500
    slow-query-log=1
    slow-query-log-file=/var/log/mysql/slow.log
    long-query-time=2
  slave.cnf: |
    [mysqld]
    server-id=2
    relay-log=relay-bin
    read-only=1
    relay-log-recovery=1
    slave-parallel-workers=4
    innodb-buffer-pool-size=2G
    max-connections=500
---
# 3. 创建密钥
apiVersion: v1
kind: Secret
metadata:
  name: mysql-secret
  namespace: mysql-cluster
type: Opaque
stringData:
  root-password: "MySQLRootPassword123!"
  replication-user: "repl_user"
  replication-password: "ReplPassword456!"
  monitor-user: "monitor"
  monitor-password: "MonitorPass789!"
---
# 4. 创建服务
apiVersion: v1
kind: Service
metadata:
  name: mysql-master
  namespace: mysql-cluster
spec:
  type: ClusterIP
  selector:
    app: mysql
    role: master
  ports:
  - port: 3306
    targetPort: 3306
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-read
  namespace: mysql-cluster
spec:
  type: ClusterIP
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
---
# 5. 创建MySQL主节点
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql-master
  namespace: mysql-cluster
spec:
  serviceName: mysql-headless
  replicas: 1
  selector:
    matchLabels:
      app: mysql
      role: master
  template:
    metadata:
      labels:
        app: mysql
        role: master
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
        - name: config
          mountPath: /etc/mysql/conf.d
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 4000m
            memory: 4Gi
        livenessProbe:
          exec:
            command: ["mysqladmin", "ping", "-h", "localhost"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
          initialDelaySeconds: 5
          periodSeconds: 2
      volumes:
      - name: config
        configMap:
          name: mysql-config
          items:
          - key: master.cnf
            path: mysql.cnf
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 100Gi
---
# 6. 创建MySQL从节点
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql-slave
  namespace: mysql-cluster
spec:
  serviceName: mysql-headless
  replicas: 2
  selector:
    matchLabels:
      app: mysql
      role: slave
  template:
    metadata:
      labels:
        app: mysql
        role: slave
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
        - name: config
          mountPath: /etc/mysql/conf.d
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 4000m
            memory: 4Gi
        livenessProbe:
          exec:
            command: ["mysqladmin", "ping", "-h", "localhost"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
          initialDelaySeconds: 5
          periodSeconds: 2
      volumes:
      - name: config
        configMap:
          name: mysql-config
          items:
          - key: slave.cnf
            path: mysql.cnf
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 100Gi

初始化脚本

bash
#!/bin/bash
# init-mysql-replication.sh

# 在主节点创建复制用户
kubectl exec -n mysql-cluster mysql-master-0 -- mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "
CREATE USER 'repl_user'@'%' IDENTIFIED BY 'ReplPassword456!';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'%';
FLUSH PRIVILEGES;
"

# 获取主节点binlog位置
MASTER_STATUS=$(kubectl exec -n mysql-cluster mysql-master-0 -- mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW MASTER STATUS\G" | grep -E "File|Position")
MASTER_LOG_FILE=$(echo "$MASTER_STATUS" | grep "File:" | awk '{print $2}')
MASTER_LOG_POS=$(echo "$MASTER_STATUS" | grep "Position:" | awk '{print $2}')

# 配置从节点
for i in 0 1; do
  kubectl exec -n mysql-cluster mysql-slave-$i -- mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "
  STOP SLAVE;
  CHANGE MASTER TO
    MASTER_HOST='mysql-master-0.mysql-headless.mysql-cluster.svc.cluster.local',
    MASTER_USER='repl_user',
    MASTER_PASSWORD='ReplPassword456!',
    MASTER_LOG_FILE='${MASTER_LOG_FILE}',
    MASTER_LOG_POS=${MASTER_LOG_POS};
  START SLAVE;
  "
done

echo "MySQL replication configured successfully"

场景2:Redis Sentinel高可用集群

需求:部署Redis Sentinel集群,实现Redis主从自动故障转移。

yaml
# 1. Redis配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config
  namespace: redis-sentinel
data:
  redis.conf: |
    bind 0.0.0.0
    port 6379
    daemonize no
    appendonly yes
    appendfsync everysec
    save 900 1
    save 300 10
    save 60 10000
    maxmemory 2gb
    maxmemory-policy allkeys-lru
  sentinel.conf: |
    sentinel monitor mymaster redis-0.redis-headless 6379 2
    sentinel down-after-milliseconds mymaster 30000
    sentinel parallel-syncs mymaster 1
    sentinel failover-timeout mymaster 180000
---
# 2. Redis密钥
apiVersion: v1
kind: Secret
metadata:
  name: redis-secret
  namespace: redis-sentinel
type: Opaque
stringData:
  redis-password: "RedisSentinelPassword123!"
---
# 3. Redis服务
apiVersion: v1
kind: Service
metadata:
  name: redis-headless
  namespace: redis-sentinel
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: redis-sentinel
spec:
  type: ClusterIP
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379
---
# 4. Redis StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
  namespace: redis-sentinel
spec:
  serviceName: redis-headless
  replicas: 3
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:6.2
        command:
        - redis-server
        - /etc/redis/redis.conf
        ports:
        - containerPort: 6379
        volumeMounts:
        - name: redis-data
          mountPath: /data
        - name: config
          mountPath: /etc/redis
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 2Gi
        livenessProbe:
          exec:
            command: ["redis-cli", "ping"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command: ["redis-cli", "ping"]
          initialDelaySeconds: 5
          periodSeconds: 5
      - name: sentinel
        image: redis:6.2
        command:
        - redis-sentinel
        - /etc/redis/sentinel.conf
        ports:
        - containerPort: 26379
        volumeMounts:
        - name: config
          mountPath: /etc/redis
        resources:
          requests:
            cpu: 100m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
      volumes:
      - name: config
        configMap:
          name: redis-config
  volumeClaimTemplates:
  - metadata:
      name: redis-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: standard
      resources:
        requests:
          storage: 20Gi

场景3:Elasticsearch集群部署

需求:部署Elasticsearch集群,支持数据分片和副本。

yaml
# 1. Elasticsearch配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: elasticsearch-config
  namespace: logging
data:
  elasticsearch.yml: |
    cluster.name: k8s-logs
    node.name: ${HOSTNAME}
    network.host: 0.0.0.0
    discovery.seed_hosts: ["elasticsearch-0.elasticsearch-headless", "elasticsearch-1.elasticsearch-headless", "elasticsearch-2.elasticsearch-headless"]
    cluster.initial_master_nodes: ["elasticsearch-0", "elasticsearch-1", "elasticsearch-2"]
    node.master: true
    node.data: true
    node.ingest: true
    xpack.security.enabled: false
    xpack.monitoring.enabled: true
    xpack.watcher.enabled: false
    path.data: /usr/share/elasticsearch/data
    path.logs: /usr/share/elasticsearch/logs
    bootstrap.memory_lock: false
---
# 2. Elasticsearch服务
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch-headless
  namespace: logging
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: elasticsearch
  ports:
  - port: 9200
    name: http
  - port: 9300
    name: transport
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace: logging
spec:
  type: ClusterIP
  selector:
    app: elasticsearch
  ports:
  - port: 9200
    targetPort: 9200
    name: http
---
# 3. Elasticsearch StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: logging
spec:
  serviceName: elasticsearch-headless
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.17.0
        env:
        - name: ES_JAVA_OPTS
          value: "-Xms2g -Xmx2g"
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        ports:
        - containerPort: 9200
          name: http
        - containerPort: 9300
          name: transport
        volumeMounts:
        - name: elasticsearch-data
          mountPath: /usr/share/elasticsearch/data
        - name: config
          mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          subPath: elasticsearch.yml
        resources:
          requests:
            cpu: 1000m
            memory: 4Gi
          limits:
            cpu: 4000m
            memory: 8Gi
        livenessProbe:
          httpGet:
            path: /_cluster/health
            port: 9200
          initialDelaySeconds: 60
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /_cluster/health?local=true
            port: 9200
          initialDelaySeconds: 30
          periodSeconds: 5
      volumes:
      - name: config
        configMap:
          name: elasticsearch-config
  volumeClaimTemplates:
  - metadata:
      name: elasticsearch-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 100Gi

场景4:数据备份与恢复

需求:为MySQL数据库实现自动化备份和恢复。

yaml
# 1. 备份脚本ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: backup-script
  namespace: database
data:
  backup.sh: |
    #!/bin/bash
    set -e
    
    BACKUP_DIR="/backups"
    DATE=$(date +%Y%m%d_%H%M%S)
    BACKUP_FILE="${BACKUP_DIR}/mysql_backup_${DATE}.sql.gz"
    
    # 执行备份
    mysqldump -h ${MYSQL_HOST} -u ${MYSQL_USER} -p${MYSQL_PASSWORD} \
      --all-databases \
      --single-transaction \
      --routines \
      --triggers \
      --events \
      --master-data=2 \
      --flush-logs | gzip > ${BACKUP_FILE}
    
    # 上传到对象存储
    if [ "${UPLOAD_TO_S3}" = "true" ]; then
      aws s3 cp ${BACKUP_FILE} s3://${S3_BUCKET}/mysql-backups/
    fi
    
    # 保留最近7天的备份
    find ${BACKUP_DIR} -name "mysql_backup_*.sql.gz" -mtime +7 -delete
    
    echo "Backup completed: ${BACKUP_FILE}"
  
  restore.sh: |
    #!/bin/bash
    set -e
    
    BACKUP_FILE=$1
    
    if [ -z "$BACKUP_FILE" ]; then
      echo "Usage: restore.sh <backup-file>"
      exit 1
    fi
    
    # 从对象存储下载
    if [[ "$BACKUP_FILE" =~ ^s3:// ]]; then
      aws s3 cp ${BACKUP_FILE} /tmp/backup.sql.gz
      BACKUP_FILE=/tmp/backup.sql.gz
    fi
    
    # 执行恢复
    gunzip < ${BACKUP_FILE} | mysql -h ${MYSQL_HOST} -u ${MYSQL_USER} -p${MYSQL_PASSWORD}
    
    echo "Restore completed successfully"
---
# 2. 备份CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
  name: mysql-backup
  namespace: database
spec:
  schedule: "0 2 * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: mysql:8.0
            command:
            - /bin/bash
            - /scripts/backup.sh
            env:
            - name: MYSQL_HOST
              value: "mysql-0.mysql-headless.database.svc.cluster.local"
            - name: MYSQL_USER
              value: "root"
            - name: MYSQL_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-secret
                  key: root-password
            - name: UPLOAD_TO_S3
              value: "true"
            - name: S3_BUCKET
              value: "my-backup-bucket"
            volumeMounts:
            - name: backup-storage
              mountPath: /backups
            - name: scripts
              mountPath: /scripts
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc
          - name: scripts
            configMap:
              name: backup-script
              defaultMode: 0755
          restartPolicy: OnFailure
---
# 3. 备份存储PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: backup-pvc
  namespace: database
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard
  resources:
    requests:
      storage: 200Gi

故障排查指南

常见问题及解决方案

1. StatefulSet Pod启动失败

症状

bash
$ kubectl get pods -l app=mysql
NAME        READY   STATUS              RESTARTS   AGE
mysql-0     0/1     CrashLoopBackOff    3          5m
mysql-1     0/1     ContainerCreating   0          2m

排查步骤

bash
# 1. 查看Pod事件
kubectl describe pod mysql-0

# 2. 查看Pod日志
kubectl logs mysql-0

# 3. 检查PVC状态
kubectl get pvc data-mysql-0

# 4. 检查存储类
kubectl get storageclass

# 5. 检查InitContainer日志
kubectl logs mysql-0 -c init-mysql

可能原因

  • PVC未绑定
  • 存储类不存在
  • InitContainer失败
  • 配置错误
  • 资源不足

解决方案

bash
# 检查PVC状态
kubectl describe pvc data-mysql-0

# 检查存储后端
kubectl logs -n kube-system csi-provisioner-xxx

# 重新创建Pod
kubectl delete pod mysql-0

2. 数据同步失败

症状:MySQL主从复制中断

排查步骤

bash
# 1. 检查主节点状态
kubectl exec -it mysql-0 -- mysql -u root -p -e "SHOW MASTER STATUS\G"

# 2. 检查从节点状态
kubectl exec -it mysql-1 -- mysql -u root -p -e "SHOW SLAVE STATUS\G"

# 3. 检查网络连接
kubectl exec -it mysql-1 -- ping mysql-0.mysql-headless

# 4. 查看错误日志
kubectl exec -it mysql-1 -- cat /var/log/mysql/error.log

解决方案

bash
# 重新配置复制
kubectl exec -it mysql-1 -- mysql -u root -p -e "
STOP SLAVE;
CHANGE MASTER TO
  MASTER_HOST='mysql-0.mysql-headless',
  MASTER_USER='repl_user',
  MASTER_PASSWORD='ReplPassword456!',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=0;
START SLAVE;
"

3. 存储空间不足

症状

Error: no space left on device

排查步骤

bash
# 1. 检查PVC使用情况
kubectl exec -it mysql-0 -- df -h

# 2. 检查数据库大小
kubectl exec -it mysql-0 -- mysql -u root -p -e "SELECT table_schema, SUM(data_length + index_length) / 1024 / 1024 AS Size_MB FROM information_schema.tables GROUP BY table_schema;"

# 3. 检查PVC容量
kubectl get pvc data-mysql-0 -o yaml

解决方案

bash
# 扩容PVC
kubectl patch pvc data-mysql-0 -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'

# 清理旧数据
kubectl exec -it mysql-0 -- mysql -u root -p -e "PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 7 DAY);"

4. 网络分区问题

症状:Pod之间无法通信

排查步骤

bash
# 1. 检查DNS解析
kubectl exec -it mysql-0 -- nslookup mysql-1.mysql-headless

# 2. 检查网络连接
kubectl exec -it mysql-0 -- ping mysql-1.mysql-headless

# 3. 检查端口
kubectl exec -it mysql-0 -- telnet mysql-1.mysql-headless 3306

# 4. 检查NetworkPolicy
kubectl get networkpolicy -n database

解决方案

bash
# 检查NetworkPolicy配置
kubectl describe networkpolicy -n database

# 临时禁用NetworkPolicy
kubectl delete networkpolicy -n database --all

5. StatefulSet更新卡住

症状

bash
$ kubectl rollout status sts/mysql
Waiting for 1 pods to be ready...

排查步骤

bash
# 1. 查看更新状态
kubectl describe sts mysql

# 2. 查看Pod状态
kubectl get pods -l app=mysql

# 3. 查看Pod事件
kubectl describe pod mysql-1

# 4. 检查更新策略
kubectl get sts mysql -o yaml | grep -A 5 updateStrategy

解决方案

bash
# 暂停更新
kubectl rollout pause sts/mysql

# 手动删除问题Pod
kubectl delete pod mysql-1

# 恢复更新
kubectl rollout resume sts/mysql

故障排查流程图

StatefulSet问题

检查Pod状态 → Pending → 检查PVC/存储
    ↓ Running
检查应用日志 → 错误 → 分析错误原因
    ↓ 正常
检查网络连接 → 失败 → 检查DNS/NetworkPolicy
    ↓ 正常
检查数据同步 → 失败 → 重新配置同步
    ↓ 正常
检查存储空间 → 不足 → 扩容或清理
    ↓ 正常
应用正常运行

最佳实践建议

1. 存储配置

yaml
# 使用高性能存储类
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: database-storage
provisioner: kubernetes.io/aws-ebs
parameters:
  type: io1
  iopsPerGB: "100"
  fsType: xfs
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
# StatefulSet使用高性能存储
volumeClaimTemplates:
- metadata:
    name: data
  spec:
    accessModes: [ "ReadWriteOnce" ]
    storageClassName: database-storage
    resources:
      requests:
        storage: 100Gi

2. 资源配置

yaml
# 合理设置资源限制
resources:
  requests:
    cpu: 1000m
    memory: 2Gi
  limits:
    cpu: 4000m
    memory: 4Gi
---
# 设置PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: mysql-pdb
  namespace: database
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: mysql

3. 更新策略

yaml
# 使用RollingUpdate策略
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 0  # 从第0个Pod开始更新
  podManagementPolicy: OrderedReady  # 有序管理

4. 监控告警

yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: statefulset-alerts
  namespace: monitoring
spec:
  groups:
  - name: statefulset
    rules:
    - alert: StatefulSetReplicasMismatch
      expr: |
        kube_statefulset_status_replicas_ready != kube_statefulset_status_replicas
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "StatefulSet {{ $labels.statefulset }} replicas mismatch"
        description: "StatefulSet {{ $labels.statefulset }} in namespace {{ $labels.namespace }} has {{ $value }} ready replicas"
    
    - alert: StatefulSetDown
      expr: |
        kube_statefulset_status_replicas_ready == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "StatefulSet {{ $labels.statefulset }} is down"
        description: "StatefulSet {{ $labels.statefulset }} in namespace {{ $labels.namespace }} has no ready replicas"

5. 备份策略

yaml
# 定期备份
apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: mysql:8.0
            command: ["/scripts/backup.sh"]
            volumeMounts:
            - name: backup-storage
              mountPath: /backups
            - name: scripts
              mountPath: /scripts
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc
          - name: scripts
            configMap:
              name: backup-scripts
          restartPolicy: OnFailure

6. 安全配置

yaml
# 使用SecurityContext
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
        fsGroup: 999
      containers:
      - name: mysql
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL

7. 网络配置

yaml
# 使用Headless Service
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: mysql
  ports:
  - port: 3306
---
# 配置NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: mysql-network-policy
spec:
  podSelector:
    matchLabels:
      app: mysql
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: app
    ports:
    - protocol: TCP
      port: 3306
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: mysql
    ports:
    - protocol: TCP
      port: 3306

总结

核心要点

  1. StatefulSet特性:固定Pod标识、稳定网络标识、有序部署和扩展
  2. 存储管理:每个Pod独立的PVC,数据持久化存储
  3. 网络标识:通过Headless Service提供稳定的DNS名称
  4. 更新策略:有序滚动更新,避免数据不一致
  5. 备份恢复:定期备份,支持快速恢复

关键命令速查

bash
# StatefulSet管理
kubectl get sts                                    # 查看StatefulSet
kubectl describe sts <name>                        # 查看详情
kubectl scale sts <name> --replicas=5              # 扩缩容
kubectl rollout status sts/<name>                  # 查看更新状态
kubectl rollout undo sts/<name>                    # 回滚

# Pod管理
kubectl get pods -l app=<name>                     # 查看Pod
kubectl exec -it <pod> -- /bin/bash                # 进入Pod
kubectl logs <pod>                                 # 查看日志
kubectl delete pod <pod>                           # 删除Pod

# 存储管理
kubectl get pvc                                    # 查看PVC
kubectl describe pvc <name>                        # 查看PVC详情
kubectl patch pvc <name> -p '{...}'               # 扩容PVC

下一步学习

参考资源