Skip to content

CI/CD流水线

概述

CI/CD(持续集成/持续部署)是现代软件开发的核心实践,本章将深入探讨如何在Kubernetes环境中构建完整的CI/CD流水线,包括GitOps实践、持续集成、持续部署、自动化测试等关键内容。

核心概念

CI/CD流程

  • 持续集成(CI):代码提交后自动构建、测试
  • 持续部署(CD):自动部署到不同环境
  • GitOps:以Git为单一事实来源的运维模式
  • 基础设施即代码:所有配置代码化管理

GitOps核心原则

  • 声明式描述:系统状态声明式定义
  • 版本控制:所有变更通过Git提交
  • 自动应用:变更自动应用到集群
  • 持续协调:持续确保实际状态与期望状态一致

CI/CD工具链

  • Git仓库:GitHub、GitLab、Bitbucket
  • CI工具:Jenkins、GitLab CI、GitHub Actions
  • CD工具:ArgoCD、Flux、Spinnaker
  • 镜像仓库:Docker Hub、Harbor、ECR

GitOps实践

ArgoCD部署

yaml
apiVersion: v1
kind: Namespace
metadata:
  name: argocd
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/my-app.git
    targetRevision: HEAD
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
    - Validate=true
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    - PruneLast=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m
---
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: production
  namespace: argocd
spec:
  description: Production environment project
  sourceRepos:
  - 'https://github.com/myorg/*'
  destinations:
  - namespace: production
    server: https://kubernetes.default.svc
  - namespace: staging
    server: https://kubernetes.default.svc
  clusterResourceWhitelist:
  - group: ''
    kind: Namespace
  namespaceResourceBlacklist:
  - group: ''
    kind: ResourceQuota
  - group: ''
    kind: LimitRange
  - group: ''
    kind: NetworkPolicy
  roles:
  - name: admin
    description: Admin privileges for production project
    policies:
    - p, proj:production:admin, applications, *, production/*, allow
    - p, proj:production:admin, repositories, *, production/*, allow
---
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: microservices
  namespace: argocd
spec:
  generators:
  - list:
      elements:
      - name: user-service
        path: services/user-service
      - name: order-service
        path: services/order-service
      - name: product-service
        path: services/product-service
  template:
    metadata:
      name: '{{name}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/myorg/microservices.git
        targetRevision: HEAD
        path: '{{path}}/k8s/overlays/production'
      destination:
        server: https://kubernetes.default.svc
        namespace: production
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

Flux部署

yaml
apiVersion: v1
kind: Namespace
metadata:
  name: flux-system
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: my-app
  namespace: flux-system
spec:
  interval: 1m0s
  ref:
    branch: main
  secretRef:
    name: flux-system
  url: ssh://git@github.com/myorg/my-app
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: my-app
  namespace: flux-system
spec:
  interval: 5m0s
  path: ./k8s/overlays/production
  prune: true
  sourceRef:
    kind: GitRepository
    name: my-app
  validation: client
  healthChecks:
  - apiVersion: apps/v1
    kind: Deployment
    name: user-service
    namespace: production
  - apiVersion: apps/v1
    kind: Deployment
    name: order-service
    namespace: production
---
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageRepository
metadata:
  name: my-app
  namespace: flux-system
spec:
  image: registry.example.com/my-app
  interval: 1m0s
  secretRef:
    name: registry-credentials
---
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
metadata:
  name: my-app
  namespace: flux-system
spec:
  imageRepositoryRef:
    name: my-app
  policy:
    semver:
      range: '>=1.0.0'
---
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageUpdateAutomation
metadata:
  name: my-app
  namespace: flux-system
spec:
  interval: 1m0s
  sourceRef:
    kind: GitRepository
    name: my-app
  git:
    checkout:
      ref:
        branch: main
    commit:
      author:
        email: flux@example.com
        name: flux
      messageTemplate: '{{range .Updated.Images}}{{println .}}{{end}}'
    push:
      branch: main
  update:
    path: ./k8s
    strategy: settlers

持续集成

GitHub Actions配置

yaml
name: CI Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

env:
  REGISTRY: registry.example.com
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v3

    - name: Set up JDK 17
      uses: actions/setup-java@v3
      with:
        java-version: '17'
        distribution: 'temurin'

    - name: Cache Maven packages
      uses: actions/cache@v3
      with:
        path: ~/.m2
        key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }}
        restore-keys: |
          ${{ runner.os }}-m2

    - name: Build with Maven
      run: mvn clean package -DskipTests

    - name: Run tests
      run: mvn test

    - name: Generate coverage report
      run: mvn jacoco:report

    - name: Upload coverage to Codecov
      uses: codecov/codecov-action@v3
      with:
        file: ./target/site/jacoco/jacoco.xml
        flags: unittests
        name: codecov-umbrella

    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v2

    - name: Log in to Container Registry
      uses: docker/login-action@v2
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ secrets.REGISTRY_USERNAME }}
        password: ${{ secrets.REGISTRY_PASSWORD }}

    - name: Extract metadata for Docker
      id: meta
      uses: docker/metadata-action@v4
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=semver,pattern={{version}}
          type=semver,pattern={{major}}.{{minor}}
          type=sha

    - name: Build and push Docker image
      uses: docker/build-push-action@v4
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max

    - name: Run Trivy vulnerability scanner
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
        format: 'sarif'
        output: 'trivy-results.sarif'

    - name: Upload Trivy scan results
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: 'trivy-results.sarif'

  test:
    runs-on: ubuntu-latest
    needs: build
    steps:
    - name: Checkout code
      uses: actions/checkout@v3

    - name: Run integration tests
      run: |
        docker-compose -f docker-compose.test.yml up -d
        sleep 30
        mvn verify -P integration-test
        docker-compose -f docker-compose.test.yml down

    - name: Publish test results
      uses: EnricoMi/publish-unit-test-result-action@v2
      if: always()
      with:
        files: |
          target/surefire-reports/*.xml
          target/failsafe-reports/*.xml

  sonarqube:
    runs-on: ubuntu-latest
    needs: build
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
      with:
        fetch-depth: 0

    - name: Set up JDK 17
      uses: actions/setup-java@v3
      with:
        java-version: '17'
        distribution: 'temurin'

    - name: Cache SonarCloud packages
      uses: actions/cache@v3
      with:
        path: ~/.sonar/cache
        key: ${{ runner.os }}-sonar
        restore-keys: ${{ runner.os }}-sonar

    - name: Cache Maven packages
      uses: actions/cache@v3
      with:
        path: ~/.m2
        key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }}
        restore-keys: |
          ${{ runner.os }}-m2

    - name: Build and analyze
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
      run: mvn -B verify org.sonarsource.scanner.maven:sonar-maven-plugin:sonar -Dsonar.projectKey=myorg_my-app

GitLab CI配置

yaml
stages:
  - build
  - test
  - security
  - deploy

variables:
  MAVEN_CLI_OPTS: "-s .m2/settings.xml --batch-mode"
  MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

cache:
  paths:
    - .m2/repository/

build:
  stage: build
  image: maven:3.8.6-openjdk-17
  script:
    - mvn $MAVEN_CLI_OPTS compile
  artifacts:
    paths:
      - target/
    expire_in: 1 hour

test:
  stage: test
  image: maven:3.8.6-openjdk-17
  script:
    - mvn $MAVEN_CLI_OPTS test
    - mvn $MAVEN_CLI_OPTS jacoco:report
  artifacts:
    reports:
      junit:
        - target/surefire-reports/TEST-*.xml
      coverage_report:
        coverage_format: jacoco
        path: target/site/jacoco/jacoco.xml
    paths:
      - target/site/jacoco/
    expire_in: 1 week

code_quality:
  stage: test
  image: maven:3.8.6-openjdk-17
  script:
    - mvn $MAVEN_CLI_OPTS verify sonar:sonar -Dsonar.projectKey=$CI_PROJECT_NAME -Dsonar.host.url=$SONAR_URL -Dsonar.login=$SONAR_TOKEN
  allow_failure: true

security_scan:
  stage: security
  image: docker:latest
  services:
    - docker:dind
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker run --rm -v /var/run/docker.sock:/var/run/docker.sock aquasec/trivy:latest image --exit-code 1 --severity HIGH,CRITICAL $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  allow_failure: true

docker_build:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - |
      if [ "$CI_COMMIT_BRANCH" == "main" ]; then
        docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA $CI_REGISTRY_IMAGE:latest
        docker push $CI_REGISTRY_IMAGE:latest
      fi

deploy_staging:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config set-cluster k8s --server="$KUBE_URL" --insecure-skip-tls-verify=true
    - kubectl config set-credentials admin --token="$KUBE_TOKEN"
    - kubectl config set-context default --cluster=k8s --user=admin
    - kubectl config use-context default
    - kubectl set image deployment/my-app my-app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA -n staging
    - kubectl rollout status deployment/my-app -n staging
  environment:
    name: staging
    url: https://staging.example.com
  only:
    - develop

deploy_production:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config set-cluster k8s --server="$KUBE_URL" --insecure-skip-tls-verify=true
    - kubectl config set-credentials admin --token="$KUBE_TOKEN"
    - kubectl config set-context default --cluster=k8s --user=admin
    - kubectl config use-context default
    - kubectl set image deployment/my-app my-app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA -n production
    - kubectl rollout status deployment/my-app -n production
  environment:
    name: production
    url: https://www.example.com
  when: manual
  only:
    - main

Jenkins Pipeline配置

groovy
pipeline {
    agent {
        kubernetes {
            yaml '''
                apiVersion: v1
                kind: Pod
                metadata:
                  labels:
                    app: jenkins-agent
                spec:
                  containers:
                  - name: maven
                    image: maven:3.8.6-openjdk-17
                    command:
                    - cat
                    tty: true
                    volumeMounts:
                    - name: m2-cache
                      mountPath: /root/.m2
                  - name: docker
                    image: docker:latest
                    command:
                    - cat
                    tty: true
                    volumeMounts:
                    - name: docker-sock
                      mountPath: /var/run/docker.sock
                  volumes:
                  - name: m2-cache
                    emptyDir: {}
                  - name: docker-sock
                    hostPath:
                      path: /var/run/docker.sock
            '''
        }
    }

    environment {
        REGISTRY = 'registry.example.com'
        IMAGE_NAME = "${REGISTRY}/${JOB_NAME}"
        DOCKER_CREDENTIALS = credentials('docker-registry')
    }

    stages {
        stage('Checkout') {
            steps {
                checkout scm
                sh 'git rev-parse HEAD > commit-id'
                script {
                    env.COMMIT_ID = readFile('commit-id').trim()
                }
            }
        }

        stage('Build') {
            steps {
                container('maven') {
                    sh 'mvn clean package -DskipTests'
                }
            }
        }

        stage('Test') {
            steps {
                container('maven') {
                    sh 'mvn test'
                }
            }
            post {
                always {
                    junit 'target/surefire-reports/*.xml'
                    jacoco execPattern: 'target/jacoco.exec'
                }
            }
        }

        stage('Code Quality') {
            steps {
                container('maven') {
                    withSonarQubeEnv('SonarQube') {
                        sh 'mvn sonar:sonar'
                    }
                }
            }
        }

        stage('Security Scan') {
            steps {
                container('docker') {
                    sh """
                        docker build -t ${IMAGE_NAME}:${COMMIT_ID} .
                        docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
                            aquasec/trivy:latest image --exit-code 1 --severity HIGH,CRITICAL \
                            ${IMAGE_NAME}:${COMMIT_ID}
                    """
                }
            }
        }

        stage('Build and Push Image') {
            steps {
                container('docker') {
                    sh """
                        docker login -u ${DOCKER_CREDENTIALS_USR} -p ${DOCKER_CREDENTIALS_PSW} ${REGISTRY}
                        docker push ${IMAGE_NAME}:${COMMIT_ID}
                        
                        if [ "\${GIT_BRANCH}" == "origin/main" ]; then
                            docker tag ${IMAGE_NAME}:${COMMIT_ID} ${IMAGE_NAME}:latest
                            docker push ${IMAGE_NAME}:latest
                        fi
                    """
                }
            }
        }

        stage('Deploy to Staging') {
            when {
                branch 'develop'
            }
            steps {
                container('kubectl') {
                    sh """
                        kubectl set image deployment/my-app my-app=${IMAGE_NAME}:${COMMIT_ID} -n staging
                        kubectl rollout status deployment/my-app -n staging
                    """
                }
            }
        }

        stage('Deploy to Production') {
            when {
                branch 'main'
            }
            steps {
                input 'Deploy to Production?'
                container('kubectl') {
                    sh """
                        kubectl set image deployment/my-app my-app=${IMAGE_NAME}:${COMMIT_ID} -n production
                        kubectl rollout status deployment/my-app -n production
                    """
                }
            }
        }
    }

    post {
        always {
            cleanWs()
        }
        success {
            slackSend(color: 'good', message: "Build successful: ${env.JOB_NAME} #${env.BUILD_NUMBER}")
        }
        failure {
            slackSend(color: 'danger', message: "Build failed: ${env.JOB_NAME} #${env.BUILD_NUMBER}")
        }
    }
}

持续部署

部署策略配置

yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app-rollout
  namespace: production
spec:
  replicas: 5
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: registry.example.com/my-app:v1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 5
  strategy:
    canary:
      steps:
      - setWeight: 20
      - pause: {duration: 10m}
      - setWeight: 40
      - pause: {duration: 10m}
      - setWeight: 60
      - pause: {duration: 10m}
      - setWeight: 80
      - pause: {duration: 10m}
      analysis:
        templates:
        - templateName: success-rate
        startingStep: 2
        args:
        - name: service-name
          value: my-app
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
  namespace: production
spec:
  args:
  - name: service-name
  metrics:
  - name: success-rate
    interval: 5m
    successCondition: result[0] >= 0.95
    provider:
      prometheus:
        address: http://prometheus.monitoring.svc.cluster.local:9090
        query: |
          sum(rate(http_requests_total{status=~"2..",service="{{args.service-name}}"}[5m])) /
          sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-canary
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "20"
spec:
  rules:
  - host: www.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-app-canary
            port:
              number: 80

蓝绿部署配置

yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app-bluegreen
  namespace: production
spec:
  replicas: 5
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: registry.example.com/my-app:v1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"
  strategy:
    blueGreen:
      activeService: my-app-active
      previewService: my-app-preview
      autoPromotionEnabled: false
      scaleDownDelaySeconds: 30
      prePromotionAnalysis:
        templates:
        - templateName: pre-promotion
      postPromotionAnalysis:
        templates:
        - templateName: post-promotion
---
apiVersion: v1
kind: Service
metadata:
  name: my-app-active
  namespace: production
spec:
  type: ClusterIP
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: my-app-preview
  namespace: production
spec:
  type: ClusterIP
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080

kubectl操作命令

ArgoCD管理

bash
# 安装ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# 获取ArgoCD初始密码
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

# 访问ArgoCD UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

# 登录ArgoCD
argocd login localhost:8080

# 查看应用列表
argocd app list

# 同步应用
argocd app sync my-app

# 查看应用状态
argocd app get my-app

# 查看应用差异
argocd app diff my-app

# 查看应用历史
argocd app history my-app

# 回滚应用
argocd app rollback my-app <revision>

# 删除应用
argocd app delete my-app

Flux管理

bash
# 安装Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash

# 安装Flux到集群
flux install

# 创建Git仓库
flux create source git my-app \
  --url=https://github.com/myorg/my-app \
  --branch=main \
  --interval=1m

# 创建Kustomization
flux create kustomization my-app \
  --source=my-app \
  --path="./k8s/overlays/production" \
  --prune=true \
  --interval=5m

# 查看Flux资源
flux get sources git
flux get kustomizations

# 手动同步
flux reconcile source git my-app
flux reconcile kustomization my-app

# 暂停自动同步
flux suspend kustomization my-app

# 恢复自动同步
flux resume kustomization my-app

# 查看日志
flux logs

部署管理

bash
# 查看部署状态
kubectl get deployments -n production

# 查看Rollout状态
kubectl get rollouts -n production

# 查看Rollout详情
kubectl describe rollout my-app-rollout -n production

# 查看部署历史
kubectl rollout history deployment/my-app -n production

# 回滚部署
kubectl rollout undo deployment/my-app -n production

# 查看Pod状态
kubectl get pods -n production -w

# 查看部署事件
kubectl get events -n production --sort-by='.lastTimestamp'

# 查看资源使用
kubectl top pods -n production

# 查看日志
kubectl logs -f deployment/my-app -n production

镜像管理

bash
# 更新镜像
kubectl set image deployment/my-app my-app=registry.example.com/my-app:v2.0.0 -n production

# 查看镜像版本
kubectl get deployment my-app -n production -o jsonpath='{.spec.template.spec.containers[0].image}'

# 查看镜像历史
kubectl rollout history deployment/my-app -n production

# 查看镜像拉取策略
kubectl get deployment my-app -n production -o jsonpath='{.spec.template.spec.containers[0].imagePullPolicy}'

# 查看镜像仓库密钥
kubectl get secrets -n production | grep registry

# 创建镜像仓库密钥
kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username=<username> \
  --docker-password=<password> \
  --namespace=production

实践示例

示例1:完整CI/CD流水线

场景描述

构建一个完整的CI/CD流水线,从代码提交到生产部署的全流程自动化。

流水线配置

yaml
# .github/workflows/ci-cd.yml
name: Complete CI/CD Pipeline

on:
  push:
    branches: [ main, develop, 'release/*' ]
  pull_request:
    branches: [ main ]

env:
  REGISTRY: registry.example.com
  IMAGE_NAME: ${{ github.repository }}

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Run Super-Linter
      uses: github/super-linter@v4
      env:
        DEFAULT_BRANCH: main
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

  build:
    needs: lint
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up JDK 17
      uses: actions/setup-java@v3
      with:
        java-version: '17'
        distribution: 'temurin'
    
    - name: Build with Maven
      run: mvn clean package -DskipTests
    
    - name: Upload build artifacts
      uses: actions/upload-artifact@v3
      with:
        name: build-artifacts
        path: target/

  test:
    needs: build
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up JDK 17
      uses: actions/setup-java@v3
      with:
        java-version: '17'
        distribution: 'temurin'
    
    - name: Run unit tests
      run: mvn test
    
    - name: Run integration tests
      run: mvn verify -P integration-test
    
    - name: Generate coverage report
      run: mvn jacoco:report
    
    - name: Upload coverage to Codecov
      uses: codecov/codecov-action@v3
      with:
        file: ./target/site/jacoco/jacoco.xml

  security:
    needs: build
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Run Trivy vulnerability scanner
      uses: aquasecurity/trivy-action@master
      with:
        scan-type: 'fs'
        scan-ref: '.'
        format: 'sarif'
        output: 'trivy-results.sarif'
    
    - name: Upload Trivy scan results
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: 'trivy-results.sarif'

  docker:
    needs: [test, security]
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v2
    
    - name: Log in to Container Registry
      uses: docker/login-action@v2
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ secrets.REGISTRY_USERNAME }}
        password: ${{ secrets.REGISTRY_PASSWORD }}
    
    - name: Extract metadata for Docker
      id: meta
      uses: docker/metadata-action@v4
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=semver,pattern={{version}}
          type=sha
    
    - name: Build and push Docker image
      uses: docker/build-push-action@v4
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max

  deploy-staging:
    needs: docker
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/develop'
    steps:
    - uses: actions/checkout@v3
    
    - name: Deploy to staging
      uses: steebchen/kubectl@v2.0.0
      with:
        config: ${{ secrets.KUBE_CONFIG }}
        command: apply -f k8s/overlays/staging
    
    - name: Wait for deployment
      run: |
        kubectl rollout status deployment/my-app -n staging --timeout=300s
    
    - name: Run smoke tests
      run: |
        ./scripts/smoke-test.sh staging

  deploy-production:
    needs: docker
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
    - uses: actions/checkout@v3
    
    - name: Deploy to production
      uses: steebchen/kubectl@v2.0.0
      with:
        config: ${{ secrets.KUBE_CONFIG }}
        command: apply -f k8s/overlays/production
    
    - name: Wait for deployment
      run: |
        kubectl rollout status deployment/my-app -n production --timeout=300s
    
    - name: Run smoke tests
      run: |
        ./scripts/smoke-test.sh production
    
    - name: Notify deployment
      uses: 8398a7/action-slack@v3
      with:
        status: ${{ job.status }}
        text: 'Deployed to production'
      env:
        SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

部署命令

bash
# 触发CI/CD流水线
git push origin main

# 查看GitHub Actions状态
gh run list
gh run view

# 查看ArgoCD应用状态
argocd app get my-app

# 手动同步ArgoCD应用
argocd app sync my-app

# 查看部署状态
kubectl get all -n production

# 查看部署日志
kubectl logs -f deployment/my-app -n production

示例2:GitOps多环境部署

场景描述

使用GitOps模式管理多个环境(开发、测试、生产)的部署。

目录结构

my-app/
├── k8s/
│   ├── base/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── kustomization.yaml
│   └── overlays/
│       ├── development/
│       │   ├── kustomization.yaml
│       │   ├── patches/
│       │   └── config/
│       ├── staging/
│       │   ├── kustomization.yaml
│       │   ├── patches/
│       │   └── config/
│       └── production/
│           ├── kustomization.yaml
│           ├── patches/
│           └── config/
└── .github/
    └── workflows/
        └── ci-cd.yml

ArgoCD ApplicationSet

yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: my-app-environments
  namespace: argocd
spec:
  generators:
  - list:
      elements:
      - env: development
        namespace: development
        branch: develop
      - env: staging
        namespace: staging
        branch: release/*
      - env: production
        namespace: production
        branch: main
  template:
    metadata:
      name: 'my-app-{{env}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/myorg/my-app.git
        targetRevision: '{{branch}}'
        path: k8s/overlays/{{env}}
      destination:
        server: https://kubernetes.default.svc
        namespace: '{{namespace}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - CreateNamespace=true

Kustomize配置

yaml
# k8s/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- deployment.yaml
- service.yaml

commonLabels:
  app: my-app

---
# k8s/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: production

resources:
- ../../base

patchesStrategicMerge:
- patches/deployment-replicas.yaml
- patches/resource-limits.yaml

configMapGenerator:
- name: app-config
  behavior: merge
  files:
  - config/application.yml

secretGenerator:
- name: app-secret
  behavior: merge
  type: Opaque
  files:
  - config/db-password

images:
- name: registry.example.com/my-app
  newTag: v1.0.0

commonLabels:
  environment: production

---
# k8s/overlays/production/patches/deployment-replicas.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 5

示例3:自动化测试集成

场景描述

在CI/CD流水线中集成自动化测试,包括单元测试、集成测试、端到端测试。

测试配置

yaml
# .github/workflows/test.yml
name: Automated Testing

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  unit-test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up JDK 17
      uses: actions/setup-java@v3
      with:
        java-version: '17'
    
    - name: Run unit tests
      run: mvn test
    
    - name: Generate coverage report
      run: mvn jacoco:report
    
    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        file: ./target/site/jacoco/jacoco.xml
        threshold: 80%

  integration-test:
    runs-on: ubuntu-latest
    services:
      mysql:
        image: mysql:8.0
        env:
          MYSQL_ROOT_PASSWORD: root
          MYSQL_DATABASE: test_db
        ports:
        - 3306:3306
        options: --health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=3
      
      redis:
        image: redis:latest
        ports:
        - 6379:6379
        options: --health-cmd="redis-cli ping" --health-interval=10s --health-timeout=5s --health-retries=3
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up JDK 17
      uses: actions/setup-java@v3
      with:
        java-version: '17'
    
    - name: Run integration tests
      run: mvn verify -P integration-test
      env:
        DB_HOST: localhost
        DB_PORT: 3306
        REDIS_HOST: localhost
        REDIS_PORT: 6379

  e2e-test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Deploy to test environment
      run: |
        kubectl apply -f k8s/overlays/test
        kubectl wait --for=condition=ready pod -l app=my-app -n test --timeout=300s
    
    - name: Run E2E tests
      run: |
        npm install
        npm run test:e2e
      env:
        TEST_URL: https://test.example.com
    
    - name: Cleanup test environment
      if: always()
      run: kubectl delete namespace test

  performance-test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Run performance tests
      uses: grafana/k6-action@v0.3.0
      with:
        filename: tests/performance/load-test.js
      env:
        K6_CLOUD_TOKEN: ${{ secrets.K6_CLOUD_TOKEN }}
    
    - name: Upload performance results
      uses: actions/upload-artifact@v3
      with:
        name: performance-results
        path: summary.json

测试脚本

javascript
// tests/performance/load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 100 },
    { duration: '2m', target: 200 },
    { duration: '5m', target: 200 },
    { duration: '2m', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(99)<1500'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  let res = http.get('https://test.example.com/api/health');
  check(res, {
    'status is 200': (r) => r.status == 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });
  sleep(1);
}

故障排查指南

常见问题1:ArgoCD同步失败

症状

  • 应用状态显示OutOfSync
  • 同步操作失败

排查步骤

bash
# 查看应用状态
argocd app get my-app

# 查看应用详情
argocd app get my-app --refresh

# 查看同步状态
argocd app sync my-app --dry-run

# 查看应用日志
kubectl logs -n argocd deployment/argocd-application-controller

# 查看应用事件
kubectl get events -n production --field-selector involvedObject.name=my-app

# 查看资源差异
argocd app diff my-app

解决方案

bash
# 强制同步
argocd app sync my-app --force

# 回滚到上一个版本
argocd app rollback my-app

# 删除并重新创建应用
argocd app delete my-app
argocd app create my-app --file app.yaml

常见问题2:镜像拉取失败

症状

  • Pod状态为ImagePullBackOff
  • ErrImagePull错误

排查步骤

bash
# 查看Pod状态
kubectl describe pod <pod-name> -n production

# 查看镜像拉取密钥
kubectl get secrets -n production | grep registry

# 查看密钥详情
kubectl describe secret regcred -n production

# 测试镜像拉取
docker pull registry.example.com/my-app:v1.0.0

# 查看镜像仓库日志
kubectl logs -n production deployment/my-app

解决方案

yaml
# 创建镜像拉取密钥
apiVersion: v1
kind: Secret
metadata:
  name: regcred
  namespace: production
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: <base64-encoded-docker-config>

---
# 在Deployment中引用密钥
spec:
  template:
    spec:
      imagePullSecrets:
      - name: regcred

常见问题3:部署超时

症状

  • 部署一直处于进行中
  • Rollout卡住

排查步骤

bash
# 查看Rollout状态
kubectl get rollout my-app-rollout -n production

# 查看Rollout详情
kubectl describe rollout my-app-rollout -n production

# 查看Pod状态
kubectl get pods -n production -l app=my-app

# 查看Pod事件
kubectl describe pod <pod-name> -n production

# 查看资源使用
kubectl top pods -n production

# 查看节点资源
kubectl describe nodes

解决方案

yaml
# 增加部署超时时间
spec:
  strategy:
    canary:
      steps:
      - setWeight: 20
      - pause: {duration: 10m}
      analysis:
        templates:
        - templateName: success-rate
        args:
        - name: timeout
          value: "600"

常见问题4:测试失败

症状

  • CI流水线测试失败
  • 测试覆盖率不达标

排查步骤

bash
# 查看测试日志
cat target/surefire-reports/*.txt

# 查看测试报告
open target/site/surefire-report.html

# 查看覆盖率报告
open target/site/jacoco/index.html

# 运行特定测试
mvn test -Dtest=MyTest

# 调试测试
mvn test -Dtest=MyTest -Dmaven.surefire.debug

解决方案

xml
<!-- 增加测试超时时间 -->
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-surefire-plugin</artifactId>
  <configuration>
    <argLine>-Xmx1024m</argLine>
    <forkedProcessTimeoutInSeconds>600</forkedProcessTimeoutInSeconds>
  </configuration>
</plugin>

常见问题5:GitOps配置冲突

症状

  • Git配置与集群状态不一致
  • 手动修改被自动覆盖

排查步骤

bash
# 查看Git配置
git diff HEAD~1 k8s/overlays/production/

# 查看集群状态
kubectl get all -n production -o yaml

# 比较差异
diff <(kubectl get deployment my-app -n production -o yaml) k8s/overlays/production/deployment.yaml

# 查看ArgoCD同步历史
argocd app history my-app

# 查看ArgoCD操作日志
argocd app logs my-app

解决方案

bash
# 禁用自动同步
argocd app set my-app --sync-policy none

# 手动同步
argocd app sync my-app

# 启用自动同步
argocd app set my-app --sync-policy automated

# 使用GitOps最佳实践
# 所有变更都通过Git提交,避免手动修改集群资源

最佳实践建议

1. GitOps最佳实践

单一事实来源

yaml
# 所有配置存储在Git仓库
# 集群状态由Git配置驱动
# 避免手动修改集群资源

分支策略

yaml
# main分支 -> 生产环境
# develop分支 -> 开发环境
# release/*分支 -> 测试环境
# feature/*分支 -> 功能开发

变更审批

yaml
# 生产环境变更需要PR审批
# 至少2人review
# 自动化测试通过
# 安全扫描通过

2. CI/CD最佳实践

流水线设计

yaml
# 分阶段执行
stages:
  - lint
  - build
  - test
  - security
  - deploy

# 失败快速反馈
# 并行执行独立任务
# 缓存依赖加速构建

安全集成

yaml
# 代码扫描
- 静态代码分析
- 依赖漏洞扫描
- 密钥泄露检测

# 镜像扫描
- 基础镜像漏洞
- 应用依赖漏洞
- 配置安全检查

3. 部署策略最佳实践

金丝雀发布

yaml
# 逐步增加流量
- 20%流量 -> 监控10分钟
- 40%流量 -> 监控10分钟
- 60%流量 -> 监控10分钟
- 80%流量 -> 监控10分钟
- 100%流量

# 自动回滚
- 错误率超过阈值自动回滚
- 响应时间超过阈值自动回滚

蓝绿部署

yaml
# 准备新版本
- 部署绿色环境
- 运行冒烟测试
- 运行性能测试

# 切换流量
- 一次性切换所有流量
- 监控关键指标
- 快速回滚能力

4. 监控和告警最佳实践

关键指标

yaml
# 应用指标
- 请求成功率
- 响应时间
- 错误率

# 系统指标
- CPU使用率
- 内存使用率
- 网络IO

# 业务指标
- 订单量
- 用户活跃度
- 转化率

告警配置

yaml
# 告警规则
- 错误率 > 5% 触发告警
- 响应时间 > 2s 触发告警
- CPU使用率 > 80% 触发告警

# 告警渠道
- Slack通知
- 邮件通知
- 短信通知

5. 测试最佳实践

测试金字塔

yaml
# 单元测试
- 快速执行
- 高覆盖率
- 隔离测试

# 集成测试
- 服务间集成
- 数据库集成
- 外部API集成

# 端到端测试
- 用户场景测试
- 关键路径测试
- 性能测试

测试自动化

yaml
# 自动运行测试
- 每次提交运行单元测试
- 每次PR运行集成测试
- 每次部署运行冒烟测试

# 测试报告
- 测试覆盖率报告
- 性能测试报告
- 安全测试报告

6. 环境管理最佳实践

环境隔离

yaml
# 开发环境
- 快速迭代
- 宽松限制
- 频繁部署

# 测试环境
- 接近生产
- 自动化测试
- 性能测试

# 生产环境
- 严格限制
- 审批流程
- 监控告警

配置管理

yaml
# 环境变量
- 使用ConfigMap
- 使用Secret
- 环境隔离

# 配置版本化
- Git管理配置
- 配置变更审批
- 配置回滚能力

7. 团队协作最佳实践

代码审查

yaml
# PR审查
- 至少2人review
- 自动化检查通过
- 测试覆盖达标

# 审查清单
- 代码质量
- 安全问题
- 性能问题
- 文档完整性

文档管理

yaml
# 架构文档
- 系统架构图
- 部署流程
- 故障处理

# API文档
- OpenAPI规范
- 使用示例
- 变更日志

总结

CI/CD流水线是现代软件开发的核心实践,本章我们学习了:

  1. GitOps实践:ArgoCD和Flux的部署和配置
  2. 持续集成:GitHub Actions、GitLab CI、Jenkins Pipeline
  3. 持续部署:金丝雀发布、蓝绿部署等策略
  4. 实践示例:完整CI/CD流水线、多环境部署、自动化测试
  5. 故障排查:常见问题的诊断和解决方案
  6. 最佳实践:生产环境的CI/CD经验和建议

通过本章的学习,您应该能够构建完整的CI/CD流水线,实现从代码提交到生产部署的全流程自动化。

下一步学习

参考资源