项目架构设计
概述
项目架构设计是Kubernetes实战的核心环节,它决定了应用的可靠性、可扩展性和可维护性。本章将深入探讨如何设计一个生产级别的K8S项目架构,包括微服务架构设计、高可用部署方案、资源规划等内容。
核心概念
微服务架构原则
- 单一职责:每个服务专注于一个业务功能
- 独立部署:服务可以独立开发、测试和部署
- 去中心化:数据管理和治理去中心化
- 故障隔离:单个服务故障不影响整体系统
高可用设计要素
- 多副本部署:关键服务至少3副本
- 跨可用区部署:分散风险到不同AZ
- 健康检查:Liveness和Readiness探针
- 自动故障恢复:自动重启和重新调度
资源规划原则
- 资源配额:合理设置requests和limits
- 命名空间隔离:按环境或团队划分
- 网络策略:限制服务间通信
- 存储规划:持久化存储策略
项目架构设计
整体架构图
┌─────────────────────────────────────────────────────────────┐
│ 用户请求入口 │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Ingress/LB │
│ (负载均衡层) │
└────────┬────────┘
│
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Frontend │ │ API GW │ │ Static │
│ Service │ │ Service │ │ Assets │
└───────────┘ └─────┬─────┘ └───────────┘
│
┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ User Svc │ │ Order Svc │ │ Product │
│ │ │ │ │ Service │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└─────────────┼─────────────┘
│
┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ MySQL │ │ Redis │ │ MongoDB │
│ (主从) │ │ (集群) │ │ (副本集) │
└───────────┘ └───────────┘ └───────────┘命名空间设计
yaml
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
environment: production
team: platform
---
apiVersion: v1
kind: Namespace
metadata:
name: staging
labels:
environment: staging
team: platform
---
apiVersion: v1
kind: Namespace
metadata:
name: development
labels:
environment: development
team: platform
---
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
labels:
environment: monitoring
team: sre
---
apiVersion: v1
kind: Namespace
metadata:
name: logging
labels:
environment: logging
team: sre资源配额配置
yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
persistentvolumeclaims: "10"
pods: "50"
services: "20"
secrets: "50"
configmaps: "50"
---
apiVersion: v1
kind: LimitRange
metadata:
name: production-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "4"
memory: "8Gi"
min:
cpu: "50m"
memory: "64Mi"
- type: PersistentVolumeClaim
max:
storage: "50Gi"
min:
storage: "1Gi"网络策略设计
yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: production-network-policy
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
environment: production
- namespaceSelector:
matchLabels:
environment: monitoring
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector:
matchLabels:
environment: production
ports:
- protocol: TCP
port: 3306
- protocol: TCP
port: 6379
- protocol: TCP
port: 27017
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53高可用部署方案
多副本部署
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
namespace: production
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: api-gateway
template:
metadata:
labels:
app: api-gateway
version: v1.0.0
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api-gateway
topologyKey: kubernetes.io/hostname
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/worker
operator: Exists
containers:
- name: api-gateway
image: registry.example.com/api-gateway:v1.0.0
ports:
- containerPort: 8080
name: http
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
volumes:
- name: config
configMap:
name: api-gateway-config跨可用区部署
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: production
spec:
replicas: 5
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- user-service
topologyKey: topology.kubernetes.io/zone
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-role.kubernetes.io/worker
operator: Exists
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: user-service
containers:
- name: user-service
image: registry.example.com/user-service:v1.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"PodDisruptionBudget配置
yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-gateway-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: api-gateway
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: user-service-pdb
namespace: production
spec:
maxUnavailable: 1
selector:
matchLabels:
app: user-servicekubectl操作命令
命名空间管理
bash
# 创建命名空间
kubectl create namespace production
# 查看所有命名空间
kubectl get namespaces
# 查看命名空间详情
kubectl describe namespace production
# 删除命名空间
kubectl delete namespace production
# 设置默认命名空间
kubectl config set-context --current --namespace=production资源配额管理
bash
# 查看资源配额
kubectl get resourcequota -n production
# 查看资源配额详情
kubectl describe resourcequota production-quota -n production
# 查看LimitRange
kubectl get limitrange -n production
# 查看命名空间资源使用情况
kubectl describe namespace production网络策略管理
bash
# 查看网络策略
kubectl get networkpolicy -n production
# 查看网络策略详情
kubectl describe networkpolicy production-network-policy -n production
# 测试网络连通性
kubectl run test-pod --image=busybox -n production --rm -it -- wget -O- http://user-service:8080/health
# 查看Pod网络信息
kubectl get pods -n production -o wide高可用部署管理
bash
# 查看Deployment状态
kubectl get deployments -n production
# 查看Deployment详情
kubectl describe deployment api-gateway -n production
# 查看Pod分布
kubectl get pods -n production -o wide --show-labels
# 查看Pod所在节点和可用区
kubectl get pods -n production -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,ZONE:.spec.nodeSelector
# 手动扩缩容
kubectl scale deployment api-gateway --replicas=5 -n production
# 查看PodDisruptionBudget
kubectl get pdb -n production
# 查看PDB详情
kubectl describe pdb api-gateway-pdb -n production节点亲和性调试
bash
# 查看节点标签
kubectl get nodes --show-labels
# 给节点打标签
kubectl label nodes node-1 node-role.kubernetes.io/worker=true
# 查看节点可用区
kubectl get nodes -L topology.kubernetes.io/zone
# 查看Pod调度情况
kubectl get pods -n production -o wide
# 查看Pod调度事件
kubectl get events -n production --field-selector reason=Scheduled
# 描述Pod查看调度信息
kubectl describe pod <pod-name> -n production实践示例
示例1:电商项目架构设计
场景描述
设计一个电商平台的K8S架构,包括前端、API网关、用户服务、订单服务、商品服务、支付服务等。
架构设计
yaml
# 命名空间配置
apiVersion: v1
kind: Namespace
metadata:
name: ecommerce-prod
labels:
environment: production
project: ecommerce
---
# 资源配额
apiVersion: v1
kind: ResourceQuota
metadata:
name: ecommerce-quota
namespace: ecommerce-prod
spec:
hard:
requests.cpu: "50"
requests.memory: 100Gi
limits.cpu: "100"
limits.memory: 200Gi
pods: "100"
services: "30"
persistentvolumeclaims: "20"
---
# 前端服务
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: ecommerce-prod
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: registry.example.com/ecommerce/frontend:v1.0.0
ports:
- containerPort: 80
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "1Gi"
---
apiVersion: v1
kind: Service
metadata:
name: frontend
namespace: ecommerce-prod
spec:
type: ClusterIP
selector:
app: frontend
ports:
- port: 80
targetPort: 80
---
# API网关
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
namespace: ecommerce-prod
spec:
replicas: 5
selector:
matchLabels:
app: api-gateway
template:
metadata:
labels:
app: api-gateway
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api-gateway
topologyKey: kubernetes.io/hostname
containers:
- name: api-gateway
image: registry.example.com/ecommerce/api-gateway:v1.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: api-gateway
namespace: ecommerce-prod
spec:
type: ClusterIP
selector:
app: api-gateway
ports:
- port: 8080
targetPort: 8080
---
# 用户服务
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: ecommerce-prod
spec:
replicas: 3
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
containers:
- name: user-service
image: registry.example.com/ecommerce/user-service:v1.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
env:
- name: DB_HOST
valueFrom:
configMapKeyRef:
name: user-service-config
key: db_host
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: user-service-secret
key: db_password
---
apiVersion: v1
kind: Service
metadata:
name: user-service
namespace: ecommerce-prod
spec:
type: ClusterIP
selector:
app: user-service
ports:
- port: 8080
targetPort: 8080
---
# 订单服务
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
namespace: ecommerce-prod
spec:
replicas: 5
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
containers:
- name: order-service
image: registry.example.com/ecommerce/order-service:v1.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
---
apiVersion: v1
kind: Service
metadata:
name: order-service
namespace: ecommerce-prod
spec:
type: ClusterIP
selector:
app: order-service
ports:
- port: 8080
targetPort: 8080
---
# 商品服务
apiVersion: apps/v1
kind: Deployment
metadata:
name: product-service
namespace: ecommerce-prod
spec:
replicas: 3
selector:
matchLabels:
app: product-service
template:
metadata:
labels:
app: product-service
spec:
containers:
- name: product-service
image: registry.example.com/ecommerce/product-service:v1.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
---
apiVersion: v1
kind: Service
metadata:
name: product-service
namespace: ecommerce-prod
spec:
type: ClusterIP
selector:
app: product-service
ports:
- port: 8080
targetPort: 8080
---
# Ingress配置
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ecommerce-ingress
namespace: ecommerce-prod
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- www.example.com
- api.example.com
secretName: ecommerce-tls
rules:
- host: www.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 80
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-gateway
port:
number: 8080部署命令
bash
# 创建命名空间和资源配额
kubectl apply -f namespace.yaml
kubectl apply -f resource-quota.yaml
# 部署所有服务
kubectl apply -f frontend.yaml
kubectl apply -f api-gateway.yaml
kubectl apply -f user-service.yaml
kubectl apply -f order-service.yaml
kubectl apply -f product-service.yaml
# 部署Ingress
kubectl apply -f ingress.yaml
# 查看部署状态
kubectl get all -n ecommerce-prod
# 查看Pod分布
kubectl get pods -n ecommerce-prod -o wide
# 测试服务访问
kubectl run test-pod --image=curlimages/curl -n ecommerce-prod --rm -it -- curl http://api-gateway:8080/health示例2:多租户SaaS平台架构
场景描述
设计一个多租户SaaS平台,每个租户有独立的命名空间,通过ResourceQuota限制资源使用。
架构设计
yaml
# 租户A命名空间
apiVersion: v1
kind: Namespace
metadata:
name: tenant-a
labels:
tenant: tenant-a
plan: premium
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-a-quota
namespace: tenant-a
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "30"
services: "10"
persistentvolumeclaims: "5"
---
# 租户B命名空间
apiVersion: v1
kind: Namespace
metadata:
name: tenant-b
labels:
tenant: tenant-b
plan: standard
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-b-quota
namespace: tenant-b
spec:
hard:
requests.cpu: "5"
requests.memory: 10Gi
limits.cpu: "10"
limits.memory: 20Gi
pods: "15"
services: "5"
persistentvolumeclaims: "3"
---
# 网络策略 - 租户隔离
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tenant-a-isolation
namespace: tenant-a
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
tenant: tenant-a
egress:
- to:
- namespaceSelector:
matchLabels:
tenant: tenant-a
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
---
# 租户A的应用
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: tenant-a
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: registry.example.com/saas/web-app:v1.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
---
apiVersion: v1
kind: Service
metadata:
name: web-app
namespace: tenant-a
spec:
type: ClusterIP
selector:
app: web-app
ports:
- port: 8080
targetPort: 8080管理命令
bash
# 创建租户命名空间
kubectl apply -f tenant-namespaces.yaml
# 查看所有租户资源使用情况
kubectl get resourcequota --all-namespaces
# 查看租户A的资源使用
kubectl describe resourcequota tenant-a-quota -n tenant-a
# 查看租户网络策略
kubectl get networkpolicy -n tenant-a
# 测试租户隔离
kubectl run test-tenant-a --image=busybox -n tenant-a --rm -it -- wget -O- http://web-app.tenant-b.svc.cluster.local:8080示例3:金融交易系统高可用架构
场景描述
设计一个金融交易系统,要求极高可用性,跨可用区部署,严格的资源隔离和网络策略。
架构设计
yaml
# 生产环境命名空间
apiVersion: v1
kind: Namespace
metadata:
name: trading-prod
labels:
environment: production
system: trading
critical: "true"
---
# 严格的资源配额
apiVersion: v1
kind: ResourceQuota
metadata:
name: trading-quota
namespace: trading-prod
spec:
hard:
requests.cpu: "100"
requests.memory: 200Gi
limits.cpu: "200"
limits.memory: 400Gi
pods: "200"
services: "50"
persistentvolumeclaims: "30"
---
# 交易引擎 - 跨可用区部署
apiVersion: apps/v1
kind: Deployment
metadata:
name: trading-engine
namespace: trading-prod
spec:
replicas: 9
selector:
matchLabels:
app: trading-engine
template:
metadata:
labels:
app: trading-engine
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- trading-engine
topologyKey: topology.kubernetes.io/zone
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: trading-engine
containers:
- name: trading-engine
image: registry.example.com/trading/engine:v1.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "2000m"
memory: "4Gi"
limits:
cpu: "4000m"
memory: "8Gi"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 3
timeoutSeconds: 2
failureThreshold: 2
env:
- name: JAVA_OPTS
value: "-Xms4g -Xmx4g -XX:+UseG1GC"
---
apiVersion: v1
kind: Service
metadata:
name: trading-engine
namespace: trading-prod
spec:
type: ClusterIP
selector:
app: trading-engine
ports:
- port: 8080
targetPort: 8080
---
# PodDisruptionBudget - 保证最少可用副本
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: trading-engine-pdb
namespace: trading-prod
spec:
minAvailable: 7
selector:
matchLabels:
app: trading-engine
---
# 严格的网络策略
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: trading-network-policy
namespace: trading-prod
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
environment: production
podSelector:
matchLabels:
app: api-gateway
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: mysql
ports:
- protocol: TCP
port: 3306
- to:
- podSelector:
matchLabels:
app: redis
ports:
- protocol: TCP
port: 6379
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
---
# MySQL主从部署
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: trading-prod
spec:
serviceName: mysql
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- mysql
topologyKey: topology.kubernetes.io/zone
containers:
- name: mysql
image: mysql:8.0
ports:
- containerPort: 3306
resources:
requests:
cpu: "2000m"
memory: "8Gi"
limits:
cpu: "4000m"
memory: "16Gi"
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 500Gi
---
apiVersion: v1
kind: Service
metadata:
name: mysql
namespace: trading-prod
spec:
type: ClusterIP
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
---
apiVersion: v1
kind: Service
metadata:
name: mysql-read
namespace: trading-prod
spec:
type: ClusterIP
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306部署和验证命令
bash
# 部署交易系统
kubectl apply -f trading-system.yaml
# 查看Pod跨可用区分布
kubectl get pods -n trading-prod -o wide
# 查看Pod所在可用区
kubectl get pods -n trading-prod -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,ZONE:.spec.affinity.nodeAffinity
# 查看PDB状态
kubectl get pdb -n trading-prod
# 查看资源使用情况
kubectl top pods -n trading-prod
# 模拟节点故障测试
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# 查看Pod重新调度情况
kubectl get pods -n trading-prod -w
# 测试服务可用性
kubectl run test-client --image=curlimages/curl -n trading-prod --rm -it -- curl http://trading-engine:8080/health故障排查指南
常见问题1:Pod无法调度
症状
- Pod一直处于Pending状态
- 事件显示Insufficient cpu/memory
排查步骤
bash
# 查看Pod状态
kubectl get pods -n production
# 查看Pod事件
kubectl describe pod <pod-name> -n production
# 查看节点资源
kubectl describe nodes
# 查看资源配额使用情况
kubectl describe resourcequota -n production
# 查看LimitRange
kubectl describe limitrange -n production解决方案
yaml
# 调整资源请求
resources:
requests:
cpu: "100m" # 降低请求
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"常见问题2:网络策略导致服务无法访问
症状
- 服务内部无法访问
- 跨命名空间访问失败
排查步骤
bash
# 查看网络策略
kubectl get networkpolicy -n production
# 查看网络策略详情
kubectl describe networkpolicy <policy-name> -n production
# 测试网络连通性
kubectl run test-pod --image=busybox -n production --rm -it -- wget -O- http://service-name:8080
# 查看Pod网络标签
kubectl get pods -n production --show-labels
# 查看命名空间标签
kubectl get namespace production --show-labels解决方案
yaml
# 修改网络策略允许必要的流量
spec:
ingress:
- from:
- namespaceSelector:
matchLabels:
environment: production
- namespaceSelector:
matchLabels:
environment: staging # 添加staging访问常见问题3:资源配额不足
症状
- 无法创建新的Pod
- 错误信息:exceeded quota
排查步骤
bash
# 查看资源配额状态
kubectl get resourcequota -n production
# 查看详细使用情况
kubectl describe resourcequota production-quota -n production
# 查看所有Pod资源使用
kubectl top pods -n production
# 查看命名空间资源使用
kubectl describe namespace production解决方案
yaml
# 增加资源配额
spec:
hard:
requests.cpu: "30" # 从20增加到30
requests.memory: 60Gi # 从40Gi增加到60Gi
limits.cpu: "60" # 从40增加到60
limits.memory: 120Gi # 从80Gi增加到120Gi常见问题4:Pod分布不均匀
症状
- 所有Pod集中在少数节点
- 节点负载不均衡
排查步骤
bash
# 查看Pod分布
kubectl get pods -n production -o wide
# 查看节点资源使用
kubectl top nodes
# 查看Pod反亲和性配置
kubectl get deployment <deployment-name> -n production -o yaml | grep -A 20 affinity
# 查看节点标签
kubectl get nodes --show-labels解决方案
yaml
# 添加Pod反亲和性
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: my-app
topologyKey: kubernetes.io/hostname常见问题5:PDB阻止节点维护
症状
- kubectl drain命令卡住
- 节点无法进入维护模式
排查步骤
bash
# 查看PDB状态
kubectl get pdb -n production
# 查看PDB详情
kubectl describe pdb <pdb-name> -n production
# 查看当前Pod数量
kubectl get pods -n production -l app=<app-name>
# 强制删除Pod(谨慎使用)
kubectl delete pod <pod-name> -n production --force --grace-period=0解决方案
yaml
# 调整PDB配置
spec:
minAvailable: 1 # 从2降低到1
# 或使用maxUnavailable
maxUnavailable: 1最佳实践建议
1. 命名空间设计最佳实践
环境隔离
yaml
# 推荐的命名空间命名规范
- production # 生产环境
- staging # 预发布环境
- development # 开发环境
- monitoring # 监控系统
- logging # 日志系统
- ci-cd # CI/CD工具标签规范
yaml
metadata:
labels:
environment: production
team: platform
project: ecommerce
critical: "true"2. 资源配额最佳实践
合理设置资源请求
yaml
resources:
requests:
cpu: "500m" # 根据实际使用设置
memory: "512Mi" # 留有20-30%余量
limits:
cpu: "2000m" # 不超过requests的4倍
memory: "2Gi" # 不超过requests的4倍使用LimitRange设置默认值
yaml
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"3. 高可用部署最佳实践
多副本部署
yaml
spec:
replicas: 3 # 关键服务至少3副本跨可用区部署
yaml
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: critical-service
topologyKey: topology.kubernetes.io/zone健康检查配置
yaml
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 34. 网络策略最佳实践
默认拒绝所有流量
yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress只允许必要的流量
yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-specific-traffic
spec:
podSelector:
matchLabels:
app: my-app
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 80805. 安全最佳实践
使用ServiceAccount
yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-service-account
namespace: production
automountServiceAccountToken: falseRBAC配置
yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
namespace: production
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]6. 监控和日志最佳实践
资源监控
bash
# 定期检查资源使用
kubectl top nodes
kubectl top pods -n production
# 设置资源告警
# Prometheus规则示例日志收集
yaml
# 日志输出到标准输出
spec:
containers:
- name: app
volumeMounts:
- name: logs
mountPath: /app/logs7. 灾难恢复最佳实践
定期备份
bash
# 备份ETCD
ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# 备份资源
kubectl get all -n production -o yaml > production-backup.yaml多集群部署
yaml
# 使用联邦或多集群管理工具
# 如KubeFed、Rancher、GKE Multi-cluster总结
项目架构设计是Kubernetes实战的基础,良好的架构设计能够确保系统的可靠性、可扩展性和可维护性。本章我们学习了:
- 命名空间设计:合理划分环境、团队和项目
- 资源配额管理:限制资源使用,防止单个应用占用过多资源
- 高可用部署:多副本、跨可用区、健康检查
- 网络策略:限制服务间通信,提高安全性
- 最佳实践:生产环境的设计原则和经验总结
通过三个实战示例,我们深入了解了电商项目、多租户SaaS平台和金融交易系统的架构设计。这些示例展示了如何将理论知识应用到实际项目中。