Skip to content

PersistentVolume(PV/PVC)

概述

PersistentVolume(PV)和PersistentVolumeClaim(PVC)是Kubernetes中用于管理存储资源的核心概念。它们将存储资源从Pod的生命周期中解耦出来,实现了存储的持久化和独立管理。

核心概念

PersistentVolume(PV)

  • 集群级别的存储资源,由管理员配置或动态创建
  • 独立于Pod生命周期,数据持久化存储
  • 支持多种存储后端(NFS、Ceph、云存储等)

PersistentVolumeClaim(PVC)

  • 命名空间级别的存储请求,由用户创建
  • 声明所需的存储大小和访问模式
  • 自动绑定满足条件的PV

存储生命周期

  1. 供应(Provisioning):创建PV(静态或动态)
  2. 绑定(Binding):PVC与PV建立关联
  3. 使用(Using):Pod通过PVC使用存储
  4. 回收(Reclaiming):释放存储资源

PV与PVC的关系

┌─────────────┐
│     Pod     │
└──────┬──────┘
       │ 使用

┌─────────────┐         ┌─────────────┐
│     PVC     │ ←绑定─→ │     PV      │
└─────────────┘         └──────┬──────┘


                        ┌─────────────┐
                        │  存储后端    │
                        │ (NFS/Cloud) │
                        └─────────────┘

访问模式

模式缩写说明
ReadWriteOnceRWO单节点读写
ReadOnlyManyROX多节点只读
ReadWriteManyRWX多节点读写
ReadWriteOncePodRWOP单Pod读写(Kubernetes 1.22+)

回收策略

策略说明适用场景
Retain保留数据,需手动清理重要数据、生产环境
Delete自动删除存储资源云存储、临时数据
Recycle删除数据后重新可用(已废弃)NFS等

完整YAML配置示例

1. 静态PV示例

yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-nfs-1
  labels:
    type: nfs
    environment: production
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs-storage
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    server: 192.168.1.100
    path: "/data/pv1"

2. PVC示例

yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-web-app
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 5Gi
  storageClassName: nfs-storage
  selector:
    matchLabels:
      environment: production

3. Pod使用PVC

yaml
apiVersion: v1
kind: Pod
metadata:
  name: web-app
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx:1.21
    ports:
    - containerPort: 80
    volumeMounts:
    - name: web-data
      mountPath: /usr/share/nginx/html
  volumes:
  - name: web-data
    persistentVolumeClaim:
      claimName: pvc-web-app
      readOnly: false

4. 块存储PV示例

yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-block-1
spec:
  capacity:
    storage: 50Gi
  volumeMode: Block
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: fast-block
  local:
    path: /dev/sdb
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - node-1

5. PVC使用块存储

yaml
apiVersion: v1
kind: Pod
metadata:
  name: mysql-block
spec:
  containers:
  - name: mysql
    image: mysql:8.0
    ports:
    - containerPort: 3306
    volumeDevices:
    - name: mysql-data
      devicePath: /dev/xvda
    env:
    - name: MYSQL_ROOT_PASSWORD
      value: "password123"
  volumes:
  - name: mysql-data
    persistentVolumeClaim:
      claimName: pvc-block-mysql

kubectl操作命令

PV管理命令

bash
# 查看所有PV
kubectl get pv

# 查看PV详细信息
kubectl describe pv pv-nfs-1

# 查看PV容量使用情况
kubectl get pv -o custom-columns=NAME:.metadata.name,CAPACITY:.spec.capacity.storage,STATUS:.status.phase,CLAIM:.spec.claimRef.name

# 编辑PV
kubectl edit pv pv-nfs-1

# 删除PV(必须先删除PVC)
kubectl delete pv pv-nfs-1

# 查看PV的YAML配置
kubectl get pv pv-nfs-1 -o yaml

PVC管理命令

bash
# 查看所有PVC
kubectl get pvc

# 查看特定命名空间的PVC
kubectl get pvc -n production

# 查看PVC详细信息
kubectl describe pvc pvc-web-app

# 创建PVC
kubectl apply -f pvc-web-app.yaml

# 删除PVC
kubectl delete pvc pvc-web-app

# 查看PVC绑定的PV
kubectl get pvc pvc-web-app -o jsonpath='{.spec.volumeName}'

# 扩容PVC(需要存储类支持)
kubectl patch pvc pvc-web-app -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'

故障排查命令

bash
# 查看PVC事件
kubectl describe pvc pvc-web-app | grep -A 10 Events

# 查看PV绑定状态
kubectl get pv -o json | jq '.items[] | select(.spec.claimRef.name=="pvc-web-app")'

# 查看存储使用情况
kubectl top pv

# 查看Pod挂载的PVC
kubectl get pod web-app -o jsonpath='{.spec.volumes[?(@.persistentVolumeClaim)].persistentVolumeClaim.claimName}'

# 检查PV回收策略
kubectl get pv -o custom-columns=NAME:.metadata.name,RECLAIM:.spec.persistentVolumeReclaimPolicy

真实场景实践示例

场景1:Web应用静态网站部署

需求:部署一个静态网站,需要持久化存储HTML文件,支持多副本读取。

yaml
# 1. 创建PV
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-static-web
  labels:
    app: static-web
    tier: frontend
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: manual
  nfs:
    server: nfs-server.example.com
    path: "/data/static-web"
---
# 2. 创建PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-static-web
  namespace: web
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 5Gi
  storageClassName: manual
  selector:
    matchLabels:
      app: static-web
---
# 3. 创建Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: static-web
  namespace: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: static-web
  template:
    metadata:
      labels:
        app: static-web
    spec:
      containers:
      - name: nginx
        image: nginx:1.21-alpine
        ports:
        - containerPort: 80
        volumeMounts:
        - name: web-content
          mountPath: /usr/share/nginx/html
          readOnly: true
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi
      volumes:
      - name: web-content
        persistentVolumeClaim:
          claimName: pvc-static-web
          readOnly: true
---
# 4. 创建Service
apiVersion: v1
kind: Service
metadata:
  name: static-web-svc
  namespace: web
spec:
  type: LoadBalancer
  selector:
    app: static-web
  ports:
  - port: 80
    targetPort: 80

部署步骤

bash
# 创建命名空间
kubectl create namespace web

# 应用配置
kubectl apply -f static-web.yaml

# 验证部署
kubectl get all -n web

# 上传网站内容到NFS
kubectl cp ./website/. static-web-xxx:/usr/share/nginx/html/ -n web

场景2:MySQL数据库持久化存储

需求:部署MySQL数据库,需要高性能块存储,支持数据持久化和备份。

yaml
# 1. 创建PVC(使用动态供应)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-data
  namespace: database
  labels:
    app: mysql
    component: database
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 50Gi
  storageClassName: fast-ssd
---
# 2. 创建MySQL配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql-config
  namespace: database
data:
  my.cnf: |
    [mysqld]
    innodb_buffer_pool_size = 1G
    max_connections = 500
    query_cache_size = 0
    query_cache_type = 0
    slow_query_log = 1
    slow_query_log_file = /var/log/mysql/slow.log
    long_query_time = 2
---
# 3. 创建Secret
apiVersion: v1
kind: Secret
metadata:
  name: mysql-secret
  namespace: database
type: Opaque
stringData:
  root-password: "StrongPassword123!"
  user-password: "UserPassword456!"
---
# 4. 创建MySQL StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: database
spec:
  serviceName: mysql-headless
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
          name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        - name: MYSQL_DATABASE
          value: "appdb"
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
        - name: mysql-config
          mountPath: /etc/mysql/conf.d
          readOnly: true
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 2Gi
        livenessProbe:
          exec:
            command:
            - mysqladmin
            - ping
            - -h
            - localhost
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - mysql
            - -h
            - localhost
            - -u
            - root
            - -p$(MYSQL_ROOT_PASSWORD)
            - -e
            - SELECT 1
          initialDelaySeconds: 5
          periodSeconds: 2
      volumes:
      - name: mysql-config
        configMap:
          name: mysql-config
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 50Gi
---
# 5. 创建Service
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
  namespace: database
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
---
apiVersion: v1
kind: Service
metadata:
  name: mysql
  namespace: database
spec:
  type: ClusterIP
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306

备份脚本

bash
#!/bin/bash
# mysql-backup.sh

NAMESPACE="database"
BACKUP_DIR="/backups/mysql"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_DIR}/mysql_backup_${DATE}.sql.gz"

# 创建备份目录
mkdir -p ${BACKUP_DIR}

# 执行备份
kubectl exec -n ${NAMESPACE} mysql-0 -- \
  mysqldump -u root -p"${MYSQL_ROOT_PASSWORD}" \
  --all-databases \
  --single-transaction \
  --routines \
  --triggers \
  --events | gzip > ${BACKUP_FILE}

# 保留最近7天的备份
find ${BACKUP_DIR} -name "mysql_backup_*.sql.gz" -mtime +7 -delete

echo "Backup completed: ${BACKUP_FILE}"

场景3:共享存储的CI/CD流水线

需求:CI/CD流水线需要共享存储,多个构建任务共享工作空间和依赖缓存。

yaml
# 1. 创建共享存储PV
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-cicd-workspace
  labels:
    type: shared
    environment: cicd
spec:
  capacity:
    storage: 100Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: shared-storage
  nfs:
    server: nfs-storage.example.com
    path: "/data/cicd-workspace"
---
# 2. 创建PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-cicd-workspace
  namespace: cicd
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 100Gi
  storageClassName: shared-storage
---
# 3. Maven缓存PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-maven-cache
  namespace: cicd
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 20Gi
  storageClassName: shared-storage
---
# 4. Jenkins Agent Pod模板
apiVersion: v1
kind: Pod
metadata:
  name: jenkins-agent-maven
  namespace: cicd
  labels:
    app: jenkins-agent
    type: maven
spec:
  containers:
  - name: maven
    image: maven:3.8.6-openjdk-11
    command:
    - cat
    tty: true
    volumeMounts:
    - name: workspace
      mountPath: /home/jenkins/agent
    - name: maven-cache
      mountPath: /root/.m2/repository
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
      limits:
        cpu: 2000m
        memory: 4Gi
  - name: docker
    image: docker:20.10
    command:
    - cat
    tty: true
    volumeMounts:
    - name: workspace
      mountPath: /home/jenkins/agent
    - name: docker-sock
      mountPath: /var/run/docker.sock
    securityContext:
      privileged: true
  volumes:
  - name: workspace
    persistentVolumeClaim:
      claimName: pvc-cicd-workspace
  - name: maven-cache
    persistentVolumeClaim:
      claimName: pvc-maven-cache
  - name: docker-sock
    hostPath:
      path: /var/run/docker.sock

Jenkinsfile示例

groovy
pipeline {
  agent {
    kubernetes {
      yaml '''
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: maven
    image: maven:3.8.6-openjdk-11
    command:
    - cat
    tty: true
    volumeMounts:
    - name: workspace
      mountPath: /home/jenkins/agent
    - name: maven-cache
      mountPath: /root/.m2/repository
  volumes:
  - name: workspace
    persistentVolumeClaim:
      claimName: pvc-cicd-workspace
  - name: maven-cache
    persistentVolumeClaim:
      claimName: pvc-maven-cache
'''
    }
  }
  
  stages {
    stage('Checkout') {
      steps {
        checkout scm
      }
    }
    
    stage('Build') {
      steps {
        container('maven') {
          sh 'mvn clean package -DskipTests'
        }
      }
    }
    
    stage('Test') {
      steps {
        container('maven') {
          sh 'mvn test'
        }
      }
      post {
        always {
          junit '**/target/surefire-reports/*.xml'
        }
      }
    }
    
    stage('Deploy') {
      steps {
        echo 'Deploying application...'
      }
    }
  }
}

故障排查指南

常见问题及解决方案

1. PVC一直处于Pending状态

症状

bash
$ kubectl get pvc
NAME           STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-web-app    Pending                                       nfs-storage    5m

排查步骤

bash
# 1. 查看PVC事件
kubectl describe pvc pvc-web-app

# 2. 检查是否存在匹配的PV
kubectl get pv -l environment=production

# 3. 检查StorageClass
kubectl get storageclass nfs-storage -o yaml

# 4. 检查存储后端是否正常
kubectl logs -n kube-system nfs-provisioner-xxx

可能原因

  • 没有匹配的PV
  • StorageClass配置错误
  • 存储后端不可用
  • 访问模式不匹配
  • 存储容量不足

解决方案

yaml
# 确保PV标签匹配PVC选择器
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-nfs-1
  labels:
    environment: production  # 必须匹配PVC的selector
spec:
  storageClassName: nfs-storage  # 必须匹配PVC的storageClassName
  accessModes:
    - ReadWriteMany  # 必须匹配PVC的accessModes
  capacity:
    storage: 10Gi  # 必须大于等于PVC请求的大小

2. Pod无法挂载PVC

症状

Warning  FailedMount  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data]

排查步骤

bash
# 1. 查看Pod事件
kubectl describe pod web-app

# 2. 检查PVC状态
kubectl get pvc pvc-web-app -o yaml

# 3. 检查PV状态
kubectl get pv -o yaml | grep -A 5 "claimRef:"

# 4. 查看kubelet日志
kubectl logs -n kube-system kubelet-xxx | grep -i mount

# 5. 检查节点上的挂载
kubectl exec web-app -- df -h
kubectl exec web-app -- mount | grep pvc

可能原因

  • PVC未绑定到PV
  • PV的访问模式与Pod需求不匹配
  • 存储后端故障
  • 节点权限问题
  • 挂载路径冲突

解决方案

bash
# 检查PVC是否已绑定
kubectl get pvc pvc-web-app -o jsonpath='{.status.phase}'

# 如果未绑定,检查原因
kubectl describe pvc pvc-web-app

# 检查存储后端连接
kubectl exec web-app -- ls -la /usr/share/nginx/html

# 重新创建Pod
kubectl delete pod web-app
kubectl apply -f pod.yaml

3. 数据丢失问题

症状:Pod重启后数据丢失

排查步骤

bash
# 1. 检查PV回收策略
kubectl get pv -o custom-columns=NAME:.metadata.name,RECLAIM:.spec.persistentVolumeReclaimPolicy

# 2. 检查PVC是否正确绑定
kubectl get pvc -o wide

# 3. 检查Pod的volumeMounts配置
kubectl get pod web-app -o jsonpath='{.spec.volumes}'

# 4. 验证数据是否真的在PV中
kubectl exec web-app -- ls -la /data

可能原因

  • 使用了emptyDir而非PVC
  • PV回收策略为Delete
  • 挂载路径错误
  • 数据未同步到存储后端

解决方案

yaml
# 确保使用PVC而非emptyDir
volumes:
- name: data
  persistentVolumeClaim:  # 正确
    claimName: pvc-web-app
  # emptyDir: {}  # 错误:会导致数据丢失

# 设置正确的回收策略
spec:
  persistentVolumeReclaimPolicy: Retain  # 保留数据

4. 存储扩容失败

症状

Error: persistentvolumeclaims "pvc-web-app" could not be patched: persistentvolumeclaims "pvc-web-app" is forbidden: only dynamically provisioned pvc can be resized

排查步骤

bash
# 1. 检查StorageClass是否支持扩容
kubectl get storageclass -o json | jq '.items[] | {name:.metadata.name, allowExpansion:.allowVolumeExpansion}'

# 2. 检查PVC状态
kubectl describe pvc pvc-web-app

# 3. 查看存储后端是否支持在线扩容
kubectl logs -n kube-system csi-driver-xxx

解决方案

yaml
# 启用StorageClass扩容功能
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
allowVolumeExpansion: true  # 启用扩容
bash
# 执行扩容
kubectl patch pvc pvc-web-app -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'

# 验证扩容
kubectl get pvc pvc-web-app

5. 多节点挂载冲突

症状

Multi-Attach error for volume "pv-nfs-1" The volume is already exclusively attached to one node and can't be attached to another.

排查步骤

bash
# 1. 检查PV访问模式
kubectl get pv pv-nfs-1 -o jsonpath='{.spec.accessModes}'

# 2. 检查哪些Pod在使用该PV
kubectl get pods -A -o json | jq '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName=="pvc-web-app") | {name:.metadata.name, namespace:.metadata.namespace, node:.spec.nodeName}'

# 3. 检查存储后端是否支持多节点挂载

解决方案

yaml
# 对于需要多节点访问的场景,使用RWX模式
spec:
  accessModes:
    - ReadWriteMany  # 支持多节点读写

# 确保存储后端支持RWX(如NFS)
nfs:
  server: nfs-server.example.com
  path: "/data/shared"

故障排查流程图

PVC Pending

检查StorageClass → 不存在 → 创建StorageClass
    ↓ 存在
检查PV → 不存在 → 创建PV或启用动态供应
    ↓ 存在
检查标签匹配 → 不匹配 → 修改PV标签
    ↓ 匹配
检查访问模式 → 不匹配 → 修改PV访问模式
    ↓ 匹配
检查容量 → 不足 → 扩容PV
    ↓ 足够
检查存储后端 → 故障 → 修复存储后端
    ↓ 正常
PVC绑定成功

最佳实践建议

1. 存储规划

容量规划

  • 预留20-30%的存储余量
  • 考虑数据增长趋势
  • 设置合理的资源配额
yaml
# 设置命名空间存储配额
apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-quota
  namespace: production
spec:
  hard:
    requests.storage: "500Gi"
    persistentvolumeclaims: "10"

存储类型选择

应用类型推荐存储类型访问模式回收策略
Web静态文件NFS/对象存储RWXRetain
数据库块存储(SSD)RWORetain
缓存本地SSDRWODelete
日志对象存储RWXDelete
CI/CD工作空间NFSRWXRetain

2. 安全配置

访问控制

yaml
# 限制PVC创建权限
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pvc-manager
  namespace: production
rules:
- apiGroups: [""]
  resources: ["persistentvolumeclaims"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
  resources: ["persistentvolumes"]
  verbs: ["get", "list", "watch"]

数据加密

yaml
# 使用加密StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: encrypted-storage
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  encrypted: "true"
  kmsKeyId: "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"

3. 性能优化

存储性能调优

yaml
# 高性能StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: io1
  iopsPerGB: "50"
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

挂载选项优化

yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-optimized
spec:
  capacity:
    storage: 100Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  mountOptions:
    - noatime
    - nobarrier
    - data=writeback
  nfs:
    server: nfs-server.example.com
    path: "/data/optimized"

4. 持久卷管理策略

卷快照管理

卷快照是Kubernetes 1.17+引入的特性,用于创建PV的点-in-time副本,支持数据备份和恢复。

yaml
# 创建卷快照
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: mysql-snapshot
  namespace: database
spec:
  volumeSnapshotClassName: csi-aws-vsc
  source:
    persistentVolumeClaimName: mysql-data
yaml
# 从快照恢复
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-data-restored
  namespace: database
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: gp2
  resources:
    requests:
      storage: 50Gi
  dataSource:
    name: mysql-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io

存储迁移策略

方案1:使用卷快照进行迁移

bash
# 1. 在源集群创建快照
kubectl apply -f snapshot.yaml

# 2. 导出快照元数据
kubectl get volumesnapshot mysql-snapshot -o yaml > snapshot.yaml

# 3. 在目标集群导入快照
kubectl apply -f snapshot.yaml

# 4. 从快照创建PVC
kubectl apply -f restore.yaml

方案2:使用Velero跨集群迁移

bash
# 1. 在源集群创建备份
velero backup create mysql-backup --include-resources=pvc,pv --selector app=mysql

# 2. 将备份复制到目标集群
velero backup get mysql-backup
velero backup download mysql-backup

# 3. 在目标集群恢复
velero restore create --from-backup mysql-backup

存储生命周期管理

基于StorageClass的生命周期管理

yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp2-with-lifecycle
type: kubernetes.io/aws-ebs
parameters:
  type: gp2
  encrypted: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
mountOptions:
  - debug

存储资源回收

bash
# 清理未使用的PV
kubectl get pv | grep Available
kubectl delete pv pv-xxx

# 清理绑定但未使用的PVC
kubectl get pvc | grep Bound
kubectl get pods -o json | jq '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName=="pvc-xxx")'

5. 高级备份策略

分层备份策略

备份类型频率保留期适用场景
完全备份每天7天重要数据
增量备份每小时24小时频繁变更数据
差异备份每周30天中等重要数据

备份存储策略

本地备份

yaml
# 使用本地存储作为备份目标
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: local-backup
  namespace: velero
spec:
  provider: velero.io/aws
  objectStorage:
    bucket: velero-backups
  credential:
    name: cloud-credentials
    key: cloud

云存储备份

yaml
# 使用AWS S3作为备份目标
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: aws-backup
  namespace: velero
spec:
  provider: velero.io/aws
  objectStorage:
    bucket: kubernetes-backups
  credential:
    name: aws-credentials
    key: credentials

灾难恢复计划

1. 备份策略

  • 所有关键数据每天进行完全备份
  • 每小时进行增量备份
  • 备份数据异地存储

2. 恢复测试

  • 每月进行恢复演练
  • 验证备份完整性
  • 记录恢复时间

3. 恢复流程

  • 确认灾难情况
  • 选择合适的备份点
  • 执行恢复操作
  • 验证服务可用性

4. 自动化恢复

yaml
# 灾难恢复计划
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: disaster-recovery
  namespace: velero
spec:
  schedule: "@every 6h"
  template:
    includedNamespaces:
    - "*"
    includedResources:
    - "*"
    storageLocation: aws-backup
    volumeSnapshotLocations:
    - aws-snapshots
    ttl: 168h  # 7天

备份验证框架

自动化验证

bash
#!/bin/bash
# validate-backup.sh

BACKUP_NAME="daily-backup-$(date +%Y%m%d)"
NAMESPACE="production"

# 触发备份
velero backup create ${BACKUP_NAME} --include-namespaces ${NAMESPACE}

# 等待备份完成
velero backup describe ${BACKUP_NAME} --wait

# 验证备份状态
STATUS=$(velero backup get ${BACKUP_NAME} -o jsonpath='{.status.phase}')
if [ "${STATUS}" != "Completed" ]; then
  echo "Backup failed: ${STATUS}"
  exit 1
fi

# 测试恢复
velero restore create --from-backup ${BACKUP_NAME} --namespace-mappings ${NAMESPACE}:test-restore

# 验证恢复
kubectl get pods -n test-restore

# 清理测试环境
velero restore delete test-restore
kubectl delete namespace test-restore

echo "Backup validation completed successfully"

数据加密备份

静态加密

yaml
# 加密存储类
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: encrypted-gp2
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  encrypted: "true"
  kmsKeyId: "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"

传输加密

  • 使用TLS加密备份传输
  • 配置Velero使用HTTPS
  • 启用服务器端加密

6. 存储监控与告警

详细监控指标

指标名称描述告警阈值
kubelet_volume_stats_used_bytes已使用存储字节数85%
kubelet_volume_stats_available_bytes可用存储字节数<10GB
kubelet_volume_stats_inodes_used已使用inode数90%
kubelet_volume_stats_io_time_seconds_totalIO操作时间>60s/min
kubelet_volume_stats_reads_total读取操作次数基线的200%
kubelet_volume_stats_writes_total写入操作次数基线的200%

高级告警配置

yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: storage-alerts
  namespace: monitoring
spec:
  groups:
  - name: storage
    rules:
    # 存储使用率告警
    - alert: StorageUsageWarning
      expr: |
        (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.8
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Storage usage warning"
        description: "PVC {{ $labels.persistentvolumeclaim }} in {{ $labels.namespace }} is {{ $value | humanizePercentage }} full"

    # 存储使用率严重告警
    - alert: StorageUsageCritical
      expr: |
        (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.9
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Storage usage critical"
        description: "PVC {{ $labels.persistentvolumeclaim }} in {{ $labels.namespace }} is {{ $value | humanizePercentage }} full"

    # 存储IO性能告警
    - alert: StorageIOPerformance
      expr: |
        rate(kubelet_volume_stats_io_time_seconds_total[5m]) > 0.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Storage IO performance issue"
        description: "High IO wait time on PVC {{ $labels.persistentvolumeclaim }}"

    # PVC绑定失败告警
    - alert: PVCBindingFailed
      expr: |
        kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1
      for: 15m
      labels:
        severity: critical
      annotations:
        summary: "PVC binding failed"
        description: "PVC {{ $labels.persistentvolumeclaim }} in {{ $labels.namespace }} has been pending for 15 minutes"

存储健康检查

bash
#!/bin/bash
# storage-health-check.sh

# 检查PVC状态
kubectl get pvc --all-namespaces | grep -v Bound

# 检查PV状态
kubectl get pv | grep -v Available | grep -v Bound

# 检查存储使用率
kubectl get --raw /api/v1/nodes | jq '.items[].status.volumesInUse'

# 检查存储类
kubectl get storageclass

# 检查卷快照
kubectl get volumesnapshot --all-namespaces

# 检查备份状态
velero backup get

5. 监控告警

Prometheus监控规则

yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: storage-alerts
  namespace: monitoring
spec:
  groups:
  - name: storage
    rules:
    - alert: PVCAlmostFull
      expr: |
        (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "PVC {{ $labels.persistentvolumeclaim }} almost full"
        description: "PVC {{ $labels.persistentvolumeclaim }} in namespace {{ $labels.namespace }} is {{ $value | humanizePercentage }} full"
    
    - alert: PVCCriticalFull
      expr: |
        (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.95
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "PVC {{ $labels.persistentvolumeclaim }} critical full"
        description: "PVC {{ $labels.persistentvolumeclaim }} in namespace {{ $labels.namespace }} is {{ $value | humanizePercentage }} full"
    
    - alert: PVCNotBound
      expr: |
        kube_persistentvolumeclaim_status_phase{phase!="Bound"} == 1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "PVC {{ $labels.persistentvolumeclaim }} not bound"
        description: "PVC {{ $labels.persistentvolumeclaim }} in namespace {{ $labels.namespace }} is in phase {{ $labels.phase }}"

6. 命名规范

推荐命名模式

yaml
# PV命名:pv-{storage-type}-{environment}-{purpose}-{index}
metadata:
  name: pv-nfs-prod-web-01

# PVC命名:pvc-{app-name}-{purpose}
metadata:
  name: pvc-mysql-data

# StorageClass命名:{performance}-{storage-type}-{environment}
metadata:
  name: fast-ssd-prod

7. 文档化

存储文档模板

yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-mysql-prod
  annotations:
    description: "MySQL production database storage"
    owner: "database-team@example.com"
    backup-schedule: "daily at 2:00 AM"
    retention-policy: "30 days"
    cost-center: "CC-12345"
spec:
  # ... PV配置

总结

核心要点

  1. PV/PVC解耦:将存储资源与应用分离,实现存储的独立管理
  2. 生命周期管理:理解供应、绑定、使用、回收四个阶段
  3. 访问模式选择:根据应用需求选择合适的访问模式(RWO/ROX/RWX)
  4. 回收策略配置:根据数据重要性设置Retain或Delete策略
  5. 动态供应:使用StorageClass实现存储的自动化管理
  6. 卷快照管理:利用VolumeSnapshot实现数据备份和恢复
  7. 存储迁移:支持跨集群存储迁移和灾难恢复
  8. 高级备份策略:分层备份、加密备份、灾难恢复计划
  9. 存储监控:详细的监控指标和告警配置
  10. 性能优化:存储类型选择、挂载选项优化、资源管理

存储管理最佳实践

  1. 存储规划

    • 根据应用特性选择合适的存储类型
    • 预留足够的存储容量
    • 制定合理的存储配额
  2. 数据安全

    • 使用加密存储保护敏感数据
    • 实现定期备份策略
    • 配置合适的回收策略
  3. 性能优化

    • 选择高性能存储类型
    • 优化挂载选项
    • 合理配置资源请求和限制
  4. 备份与恢复

    • 实现分层备份策略
    • 定期验证备份完整性
    • 制定灾难恢复计划
  5. 监控与维护

    • 监控存储使用率和性能
    • 设置合理的告警阈值
    • 定期进行存储健康检查

关键命令速查

bash
# PV管理
kubectl get pv                                    # 查看所有PV
kubectl describe pv <pv-name>                     # 查看PV详情
kubectl delete pv <pv-name>                       # 删除PV

# PVC管理
kubectl get pvc                                   # 查看所有PVC
kubectl describe pvc <pvc-name>                   # 查看PVC详情
kubectl patch pvc <pvc-name> -p '{...}'          # 扩容PVC

# 卷快照管理
kubectl get volumesnapshot                        # 查看卷快照
kubectl create -f snapshot.yaml                   # 创建卷快照
kubectl get volumesnapshotcontent                # 查看快照内容

# 备份管理
velero backup get                                # 查看备份
velero backup create <backup-name>               # 创建备份
velero restore create --from-backup <backup-name> # 恢复备份

# 故障排查
kubectl describe pvc <pvc-name>                   # 查看PVC事件
kubectl get events --field-selector involvedObject.name=<pvc-name>  # 查看事件
kubectl logs -n kube-system <provisioner-pod>     # 查看供应器日志

下一步学习

参考资源

关键命令速查

bash
# PV管理
kubectl get pv                                    # 查看所有PV
kubectl describe pv <pv-name>                     # 查看PV详情
kubectl delete pv <pv-name>                       # 删除PV

# PVC管理
kubectl get pvc                                   # 查看所有PVC
kubectl describe pvc <pvc-name>                   # 查看PVC详情
kubectl patch pvc <pvc-name> -p '{...}'          # 扩容PVC

# 故障排查
kubectl describe pvc <pvc-name>                   # 查看PVC事件
kubectl get events --field-selector involvedObject.name=<pvc-name>  # 查看事件
kubectl logs -n kube-system <provisioner-pod>     # 查看供应器日志

下一步学习

参考资源