Skip to content

项目架构设计

概述

项目架构设计是Kubernetes实战的核心环节,它决定了应用的可靠性、可扩展性和可维护性。本章将深入探讨如何设计一个生产级别的K8S项目架构,包括微服务架构设计、高可用部署方案、资源规划等内容。

核心概念

微服务架构原则

  • 单一职责:每个服务专注于一个业务功能
  • 独立部署:服务可以独立开发、测试和部署
  • 去中心化:数据管理和治理去中心化
  • 故障隔离:单个服务故障不影响整体系统

高可用设计要素

  • 多副本部署:关键服务至少3副本
  • 跨可用区部署:分散风险到不同AZ
  • 健康检查:Liveness和Readiness探针
  • 自动故障恢复:自动重启和重新调度

资源规划原则

  • 资源配额:合理设置requests和limits
  • 命名空间隔离:按环境或团队划分
  • 网络策略:限制服务间通信
  • 存储规划:持久化存储策略

项目架构设计

整体架构图

┌─────────────────────────────────────────────────────────────┐
│                        用户请求入口                           │
└────────────────────┬────────────────────────────────────────┘


            ┌─────────────────┐
            │   Ingress/LB    │
            │  (负载均衡层)    │
            └────────┬────────┘

        ┌────────────┼────────────┐
        │            │            │
        ▼            ▼            ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Frontend  │ │ API GW    │ │ Static    │
│ Service   │ │ Service   │ │ Assets    │
└───────────┘ └─────┬─────┘ └───────────┘

        ┌───────────┼───────────┐
        │           │           │
        ▼           ▼           ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ User Svc  │ │ Order Svc │ │ Product   │
│           │ │           │ │ Service   │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
      │             │             │
      └─────────────┼─────────────┘

        ┌───────────┼───────────┐
        │           │           │
        ▼           ▼           ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│  MySQL    │ │  Redis    │ │  MongoDB  │
│ (主从)    │ │ (集群)    │ │ (副本集)  │
└───────────┘ └───────────┘ └───────────┘

命名空间设计

yaml
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: production
    team: platform
---
apiVersion: v1
kind: Namespace
metadata:
  name: staging
  labels:
    environment: staging
    team: platform
---
apiVersion: v1
kind: Namespace
metadata:
  name: development
  labels:
    environment: development
    team: platform
---
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
  labels:
    environment: monitoring
    team: sre
---
apiVersion: v1
kind: Namespace
metadata:
  name: logging
  labels:
    environment: logging
    team: sre

资源配额配置

yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    persistentvolumeclaims: "10"
    pods: "50"
    services: "20"
    secrets: "50"
    configmaps: "50"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: production-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "4"
      memory: "8Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
  - type: PersistentVolumeClaim
    max:
      storage: "50Gi"
    min:
      storage: "1Gi"

网络策略设计

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: production-network-policy
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          environment: production
    - namespaceSelector:
        matchLabels:
          environment: monitoring
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          environment: production
    ports:
    - protocol: TCP
      port: 3306
    - protocol: TCP
      port: 6379
    - protocol: TCP
      port: 27017
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53

高可用部署方案

多副本部署

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  namespace: production
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
        version: v1.0.0
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - api-gateway
              topologyKey: kubernetes.io/hostname
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-role.kubernetes.io/worker
                operator: Exists
      containers:
      - name: api-gateway
        image: registry.example.com/api-gateway:v1.0.0
        ports:
        - containerPort: 8080
          name: http
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: config
          mountPath: /app/config
          readOnly: true
      volumes:
      - name: config
        configMap:
          name: api-gateway-config

跨可用区部署

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  namespace: production
spec:
  replicas: 5
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - user-service
            topologyKey: topology.kubernetes.io/zone
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: node-role.kubernetes.io/worker
                operator: Exists
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: user-service
      containers:
      - name: user-service
        image: registry.example.com/user-service:v1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"

PodDisruptionBudget配置

yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-gateway-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api-gateway
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: user-service-pdb
  namespace: production
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: user-service

kubectl操作命令

命名空间管理

bash
# 创建命名空间
kubectl create namespace production

# 查看所有命名空间
kubectl get namespaces

# 查看命名空间详情
kubectl describe namespace production

# 删除命名空间
kubectl delete namespace production

# 设置默认命名空间
kubectl config set-context --current --namespace=production

资源配额管理

bash
# 查看资源配额
kubectl get resourcequota -n production

# 查看资源配额详情
kubectl describe resourcequota production-quota -n production

# 查看LimitRange
kubectl get limitrange -n production

# 查看命名空间资源使用情况
kubectl describe namespace production

网络策略管理

bash
# 查看网络策略
kubectl get networkpolicy -n production

# 查看网络策略详情
kubectl describe networkpolicy production-network-policy -n production

# 测试网络连通性
kubectl run test-pod --image=busybox -n production --rm -it -- wget -O- http://user-service:8080/health

# 查看Pod网络信息
kubectl get pods -n production -o wide

高可用部署管理

bash
# 查看Deployment状态
kubectl get deployments -n production

# 查看Deployment详情
kubectl describe deployment api-gateway -n production

# 查看Pod分布
kubectl get pods -n production -o wide --show-labels

# 查看Pod所在节点和可用区
kubectl get pods -n production -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,ZONE:.spec.nodeSelector

# 手动扩缩容
kubectl scale deployment api-gateway --replicas=5 -n production

# 查看PodDisruptionBudget
kubectl get pdb -n production

# 查看PDB详情
kubectl describe pdb api-gateway-pdb -n production

节点亲和性调试

bash
# 查看节点标签
kubectl get nodes --show-labels

# 给节点打标签
kubectl label nodes node-1 node-role.kubernetes.io/worker=true

# 查看节点可用区
kubectl get nodes -L topology.kubernetes.io/zone

# 查看Pod调度情况
kubectl get pods -n production -o wide

# 查看Pod调度事件
kubectl get events -n production --field-selector reason=Scheduled

# 描述Pod查看调度信息
kubectl describe pod <pod-name> -n production

实践示例

示例1:电商项目架构设计

场景描述

设计一个电商平台的K8S架构,包括前端、API网关、用户服务、订单服务、商品服务、支付服务等。

架构设计

yaml
# 命名空间配置
apiVersion: v1
kind: Namespace
metadata:
  name: ecommerce-prod
  labels:
    environment: production
    project: ecommerce
---
# 资源配额
apiVersion: v1
kind: ResourceQuota
metadata:
  name: ecommerce-quota
  namespace: ecommerce-prod
spec:
  hard:
    requests.cpu: "50"
    requests.memory: 100Gi
    limits.cpu: "100"
    limits.memory: 200Gi
    pods: "100"
    services: "30"
    persistentvolumeclaims: "20"
---
# 前端服务
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: ecommerce-prod
spec:
  replicas: 3
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: registry.example.com/ecommerce/frontend:v1.0.0
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "1000m"
            memory: "1Gi"
---
apiVersion: v1
kind: Service
metadata:
  name: frontend
  namespace: ecommerce-prod
spec:
  type: ClusterIP
  selector:
    app: frontend
  ports:
  - port: 80
    targetPort: 80
---
# API网关
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  namespace: ecommerce-prod
spec:
  replicas: 5
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: api-gateway
              topologyKey: kubernetes.io/hostname
      containers:
      - name: api-gateway
        image: registry.example.com/ecommerce/api-gateway:v1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: api-gateway
  namespace: ecommerce-prod
spec:
  type: ClusterIP
  selector:
    app: api-gateway
  ports:
  - port: 8080
    targetPort: 8080
---
# 用户服务
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  namespace: ecommerce-prod
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      containers:
      - name: user-service
        image: registry.example.com/ecommerce/user-service:v1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"
        env:
        - name: DB_HOST
          valueFrom:
            configMapKeyRef:
              name: user-service-config
              key: db_host
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: user-service-secret
              key: db_password
---
apiVersion: v1
kind: Service
metadata:
  name: user-service
  namespace: ecommerce-prod
spec:
  type: ClusterIP
  selector:
    app: user-service
  ports:
  - port: 8080
    targetPort: 8080
---
# 订单服务
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
  namespace: ecommerce-prod
spec:
  replicas: 5
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      containers:
      - name: order-service
        image: registry.example.com/ecommerce/order-service:v1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"
---
apiVersion: v1
kind: Service
metadata:
  name: order-service
  namespace: ecommerce-prod
spec:
  type: ClusterIP
  selector:
    app: order-service
  ports:
  - port: 8080
    targetPort: 8080
---
# 商品服务
apiVersion: apps/v1
kind: Deployment
metadata:
  name: product-service
  namespace: ecommerce-prod
spec:
  replicas: 3
  selector:
    matchLabels:
      app: product-service
  template:
    metadata:
      labels:
        app: product-service
    spec:
      containers:
      - name: product-service
        image: registry.example.com/ecommerce/product-service:v1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"
---
apiVersion: v1
kind: Service
metadata:
  name: product-service
  namespace: ecommerce-prod
spec:
  type: ClusterIP
  selector:
    app: product-service
  ports:
  - port: 8080
    targetPort: 8080
---
# Ingress配置
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ecommerce-ingress
  namespace: ecommerce-prod
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - www.example.com
    - api.example.com
    secretName: ecommerce-tls
  rules:
  - host: www.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend
            port:
              number: 80
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-gateway
            port:
              number: 8080

部署命令

bash
# 创建命名空间和资源配额
kubectl apply -f namespace.yaml
kubectl apply -f resource-quota.yaml

# 部署所有服务
kubectl apply -f frontend.yaml
kubectl apply -f api-gateway.yaml
kubectl apply -f user-service.yaml
kubectl apply -f order-service.yaml
kubectl apply -f product-service.yaml

# 部署Ingress
kubectl apply -f ingress.yaml

# 查看部署状态
kubectl get all -n ecommerce-prod

# 查看Pod分布
kubectl get pods -n ecommerce-prod -o wide

# 测试服务访问
kubectl run test-pod --image=curlimages/curl -n ecommerce-prod --rm -it -- curl http://api-gateway:8080/health

示例2:多租户SaaS平台架构

场景描述

设计一个多租户SaaS平台,每个租户有独立的命名空间,通过ResourceQuota限制资源使用。

架构设计

yaml
# 租户A命名空间
apiVersion: v1
kind: Namespace
metadata:
  name: tenant-a
  labels:
    tenant: tenant-a
    plan: premium
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-a-quota
  namespace: tenant-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "30"
    services: "10"
    persistentvolumeclaims: "5"
---
# 租户B命名空间
apiVersion: v1
kind: Namespace
metadata:
  name: tenant-b
  labels:
    tenant: tenant-b
    plan: standard
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-b-quota
  namespace: tenant-b
spec:
  hard:
    requests.cpu: "5"
    requests.memory: 10Gi
    limits.cpu: "10"
    limits.memory: 20Gi
    pods: "15"
    services: "5"
    persistentvolumeclaims: "3"
---
# 网络策略 - 租户隔离
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tenant-a-isolation
  namespace: tenant-a
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          tenant: tenant-a
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          tenant: tenant-a
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53
---
# 租户A的应用
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: tenant-a
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: registry.example.com/saas/web-app:v1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
---
apiVersion: v1
kind: Service
metadata:
  name: web-app
  namespace: tenant-a
spec:
  type: ClusterIP
  selector:
    app: web-app
  ports:
  - port: 8080
    targetPort: 8080

管理命令

bash
# 创建租户命名空间
kubectl apply -f tenant-namespaces.yaml

# 查看所有租户资源使用情况
kubectl get resourcequota --all-namespaces

# 查看租户A的资源使用
kubectl describe resourcequota tenant-a-quota -n tenant-a

# 查看租户网络策略
kubectl get networkpolicy -n tenant-a

# 测试租户隔离
kubectl run test-tenant-a --image=busybox -n tenant-a --rm -it -- wget -O- http://web-app.tenant-b.svc.cluster.local:8080

示例3:金融交易系统高可用架构

场景描述

设计一个金融交易系统,要求极高可用性,跨可用区部署,严格的资源隔离和网络策略。

架构设计

yaml
# 生产环境命名空间
apiVersion: v1
kind: Namespace
metadata:
  name: trading-prod
  labels:
    environment: production
    system: trading
    critical: "true"
---
# 严格的资源配额
apiVersion: v1
kind: ResourceQuota
metadata:
  name: trading-quota
  namespace: trading-prod
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    pods: "200"
    services: "50"
    persistentvolumeclaims: "30"
---
# 交易引擎 - 跨可用区部署
apiVersion: apps/v1
kind: Deployment
metadata:
  name: trading-engine
  namespace: trading-prod
spec:
  replicas: 9
  selector:
    matchLabels:
      app: trading-engine
  template:
    metadata:
      labels:
        app: trading-engine
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - trading-engine
            topologyKey: topology.kubernetes.io/zone
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: trading-engine
      containers:
      - name: trading-engine
        image: registry.example.com/trading/engine:v1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "2000m"
            memory: "4Gi"
          limits:
            cpu: "4000m"
            memory: "8Gi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 2
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 3
          timeoutSeconds: 2
          failureThreshold: 2
        env:
        - name: JAVA_OPTS
          value: "-Xms4g -Xmx4g -XX:+UseG1GC"
---
apiVersion: v1
kind: Service
metadata:
  name: trading-engine
  namespace: trading-prod
spec:
  type: ClusterIP
  selector:
    app: trading-engine
  ports:
  - port: 8080
    targetPort: 8080
---
# PodDisruptionBudget - 保证最少可用副本
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: trading-engine-pdb
  namespace: trading-prod
spec:
  minAvailable: 7
  selector:
    matchLabels:
      app: trading-engine
---
# 严格的网络策略
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: trading-network-policy
  namespace: trading-prod
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          environment: production
      podSelector:
        matchLabels:
          app: api-gateway
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: mysql
    ports:
    - protocol: TCP
      port: 3306
  - to:
    - podSelector:
        matchLabels:
          app: redis
    ports:
    - protocol: TCP
      port: 6379
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53
---
# MySQL主从部署
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: trading-prod
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - mysql
            topologyKey: topology.kubernetes.io/zone
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
        resources:
          requests:
            cpu: "2000m"
            memory: "8Gi"
          limits:
            cpu: "4000m"
            memory: "16Gi"
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 500Gi
---
apiVersion: v1
kind: Service
metadata:
  name: mysql
  namespace: trading-prod
spec:
  type: ClusterIP
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-read
  namespace: trading-prod
spec:
  type: ClusterIP
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306

部署和验证命令

bash
# 部署交易系统
kubectl apply -f trading-system.yaml

# 查看Pod跨可用区分布
kubectl get pods -n trading-prod -o wide

# 查看Pod所在可用区
kubectl get pods -n trading-prod -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,ZONE:.spec.affinity.nodeAffinity

# 查看PDB状态
kubectl get pdb -n trading-prod

# 查看资源使用情况
kubectl top pods -n trading-prod

# 模拟节点故障测试
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

# 查看Pod重新调度情况
kubectl get pods -n trading-prod -w

# 测试服务可用性
kubectl run test-client --image=curlimages/curl -n trading-prod --rm -it -- curl http://trading-engine:8080/health

故障排查指南

常见问题1:Pod无法调度

症状

  • Pod一直处于Pending状态
  • 事件显示Insufficient cpu/memory

排查步骤

bash
# 查看Pod状态
kubectl get pods -n production

# 查看Pod事件
kubectl describe pod <pod-name> -n production

# 查看节点资源
kubectl describe nodes

# 查看资源配额使用情况
kubectl describe resourcequota -n production

# 查看LimitRange
kubectl describe limitrange -n production

解决方案

yaml
# 调整资源请求
resources:
  requests:
    cpu: "100m"      # 降低请求
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

常见问题2:网络策略导致服务无法访问

症状

  • 服务内部无法访问
  • 跨命名空间访问失败

排查步骤

bash
# 查看网络策略
kubectl get networkpolicy -n production

# 查看网络策略详情
kubectl describe networkpolicy <policy-name> -n production

# 测试网络连通性
kubectl run test-pod --image=busybox -n production --rm -it -- wget -O- http://service-name:8080

# 查看Pod网络标签
kubectl get pods -n production --show-labels

# 查看命名空间标签
kubectl get namespace production --show-labels

解决方案

yaml
# 修改网络策略允许必要的流量
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          environment: production
    - namespaceSelector:
        matchLabels:
          environment: staging  # 添加staging访问

常见问题3:资源配额不足

症状

  • 无法创建新的Pod
  • 错误信息:exceeded quota

排查步骤

bash
# 查看资源配额状态
kubectl get resourcequota -n production

# 查看详细使用情况
kubectl describe resourcequota production-quota -n production

# 查看所有Pod资源使用
kubectl top pods -n production

# 查看命名空间资源使用
kubectl describe namespace production

解决方案

yaml
# 增加资源配额
spec:
  hard:
    requests.cpu: "30"      # 从20增加到30
    requests.memory: 60Gi   # 从40Gi增加到60Gi
    limits.cpu: "60"        # 从40增加到60
    limits.memory: 120Gi    # 从80Gi增加到120Gi

常见问题4:Pod分布不均匀

症状

  • 所有Pod集中在少数节点
  • 节点负载不均衡

排查步骤

bash
# 查看Pod分布
kubectl get pods -n production -o wide

# 查看节点资源使用
kubectl top nodes

# 查看Pod反亲和性配置
kubectl get deployment <deployment-name> -n production -o yaml | grep -A 20 affinity

# 查看节点标签
kubectl get nodes --show-labels

解决方案

yaml
# 添加Pod反亲和性
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: my-app
          topologyKey: kubernetes.io/hostname

常见问题5:PDB阻止节点维护

症状

  • kubectl drain命令卡住
  • 节点无法进入维护模式

排查步骤

bash
# 查看PDB状态
kubectl get pdb -n production

# 查看PDB详情
kubectl describe pdb <pdb-name> -n production

# 查看当前Pod数量
kubectl get pods -n production -l app=<app-name>

# 强制删除Pod(谨慎使用)
kubectl delete pod <pod-name> -n production --force --grace-period=0

解决方案

yaml
# 调整PDB配置
spec:
  minAvailable: 1  # 从2降低到1
  # 或使用maxUnavailable
  maxUnavailable: 1

最佳实践建议

1. 命名空间设计最佳实践

环境隔离

yaml
# 推荐的命名空间命名规范
- production      # 生产环境
- staging         # 预发布环境
- development     # 开发环境
- monitoring      # 监控系统
- logging         # 日志系统
- ci-cd           # CI/CD工具

标签规范

yaml
metadata:
  labels:
    environment: production
    team: platform
    project: ecommerce
    critical: "true"

2. 资源配额最佳实践

合理设置资源请求

yaml
resources:
  requests:
    cpu: "500m"      # 根据实际使用设置
    memory: "512Mi"  # 留有20-30%余量
  limits:
    cpu: "2000m"     # 不超过requests的4倍
    memory: "2Gi"    # 不超过requests的4倍

使用LimitRange设置默认值

yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
spec:
  limits:
  - type: Container
    default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"

3. 高可用部署最佳实践

多副本部署

yaml
spec:
  replicas: 3  # 关键服务至少3副本

跨可用区部署

yaml
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: critical-service
        topologyKey: topology.kubernetes.io/zone

健康检查配置

yaml
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 3

4. 网络策略最佳实践

默认拒绝所有流量

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

只允许必要的流量

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-specific-traffic
spec:
  podSelector:
    matchLabels:
      app: my-app
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

5. 安全最佳实践

使用ServiceAccount

yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-service-account
  namespace: production
automountServiceAccountToken: false

RBAC配置

yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: production
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

6. 监控和日志最佳实践

资源监控

bash
# 定期检查资源使用
kubectl top nodes
kubectl top pods -n production

# 设置资源告警
# Prometheus规则示例

日志收集

yaml
# 日志输出到标准输出
spec:
  containers:
  - name: app
    volumeMounts:
    - name: logs
      mountPath: /app/logs

7. 灾难恢复最佳实践

定期备份

bash
# 备份ETCD
ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# 备份资源
kubectl get all -n production -o yaml > production-backup.yaml

多集群部署

yaml
# 使用联邦或多集群管理工具
# 如KubeFed、Rancher、GKE Multi-cluster

总结

项目架构设计是Kubernetes实战的基础,良好的架构设计能够确保系统的可靠性、可扩展性和可维护性。本章我们学习了:

  1. 命名空间设计:合理划分环境、团队和项目
  2. 资源配额管理:限制资源使用,防止单个应用占用过多资源
  3. 高可用部署:多副本、跨可用区、健康检查
  4. 网络策略:限制服务间通信,提高安全性
  5. 最佳实践:生产环境的设计原则和经验总结

通过三个实战示例,我们深入了解了电商项目、多租户SaaS平台和金融交易系统的架构设计。这些示例展示了如何将理论知识应用到实际项目中。

下一步学习

参考资源