Skip to content

Job与CronJob

概述

Job是Kubernetes中用于运行一次性任务的控制器,CronJob则用于定时执行Job。它们适合批处理任务、数据备份、报表生成等场景。

Job核心概念

1. Job类型

  • 一次性Job: 运行一次直到完成
  • 固定完成次数Job: 运行指定次数
  • 工作队列Job: 并行处理工作队列

2. Job特性

  • 任务完成后Pod不删除
  • 支持任务重试
  • 支持并行执行
  • 可设置超时时间

3. Pod生命周期

  • Pod成功完成:Job标记为完成
  • Pod失败:Job创建新Pod重试
  • 达到重试次数:Job标记为失败

Job配置详解

基本Job配置

yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl:5.34
        command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

固定完成次数Job

yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: parallel-job
spec:
  completions: 5    # 需要5个Pod成功完成
  parallelism: 2    # 同时运行2个Pod
  backoffLimit: 10  # 最大重试次数
  activeDeadlineSeconds: 600  # 最大运行时间(秒)
  template:
    spec:
      containers:
      - name: worker
        image: busybox
        command: ["echo", "Task completed"]
      restartPolicy: OnFailure

工作队列Job

yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: work-queue-job
spec:
  parallelism: 3  # 3个并行Pod
  template:
    spec:
      containers:
      - name: worker
        image: myworker:latest
        command: ["python", "worker.py"]
      restartPolicy: OnFailure

CronJob配置详解

基本CronJob配置

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"  # 每分钟执行
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            command: ["echo", "Hello from CronJob"]
          restartPolicy: OnFailure

完整CronJob配置

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-job
  namespace: production
  labels:
    app: backup
spec:
  schedule: "0 2 * * *"  # 每天凌晨2点执行
  concurrencyPolicy: Forbid  # 禁止并发执行
  successfulJobsHistoryLimit: 3  # 保留3个成功Job
  failedJobsHistoryLimit: 1      # 保留1个失败Job
  startingDeadlineSeconds: 600   # 启动截止时间
  suspend: false                 # 是否暂停
  jobTemplate:
    metadata:
      labels:
        app: backup
    spec:
      backoffLimit: 3
      activeDeadlineSeconds: 3600  # 最大运行1小时
      template:
        spec:
          containers:
          - name: backup
            image: backup-tool:v1.0
            command: ["/bin/sh", "-c"]
            args:
            - |
              echo "Starting backup at $(date)"
              pg_dump -h $DB_HOST -U $DB_USER $DB_NAME > /backup/backup_$(date +%Y%m%d).sql
              echo "Backup completed"
            env:
            - name: DB_HOST
              value: "postgres-service"
            - name: DB_USER
              valueFrom:
                secretKeyRef:
                  name: db-secret
                  key: username
            - name: DB_NAME
              value: "mydb"
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc
          restartPolicy: OnFailure

Cron表达式

标准格式

┌───────────── 分钟 (0 - 59)
│ ┌───────────── 小时 (0 - 23)
│ │ ┌───────────── 日 (1 - 31)
│ │ │ ┌───────────── 月 (1 - 12)
│ │ │ │ ┌───────────── 周几 (0 - 6) (周日到周六)
│ │ │ │ │
* * * * *

常用表达式

bash
# 每分钟执行
* * * * *

# 每小时执行
0 * * * *

# 每天凌晨2点执行
0 2 * * *

# 每周一凌晨3点执行
0 3 * * 1

# 每月1号凌晨4点执行
0 4 1 * *

# 每15分钟执行
*/15 * * * *

# 工作日上午9点执行
0 9 * * 1-5

并发策略

1. Allow(默认)

yaml
concurrencyPolicy: Allow  # 允许并发执行

2. Forbid

yaml
concurrencyPolicy: Forbid  # 禁止并发,跳过新任务

3. Replace

yaml
concurrencyPolicy: Replace  # 替换正在运行的任务

操作命令

Job操作

bash
# 创建Job
kubectl apply -f job.yaml
kubectl create job --image=busybox my-job -- echo "Hello"

# 查看Job
kubectl get jobs
kubectl describe job <job-name>

# 查看Job日志
kubectl logs job/<job-name>

# 删除Job
kubectl delete job <job-name>

CronJob操作

bash
# 创建CronJob
kubectl apply -f cronjob.yaml
kubectl create cronjob my-cron --image=busybox --schedule="*/1 * * * *" -- echo "Hello"

# 查看CronJob
kubectl get cronjobs
kubectl describe cronjob <cronjob-name>

# 手动触发CronJob
kubectl create job --from=cronjob/<cronjob-name> manual-job

# 暂停CronJob
kubectl patch cronjob <cronjob-name> -p '{"spec":{"suspend":true}}'

# 恢复CronJob
kubectl patch cronjob <cronjob-name> -p '{"spec":{"suspend":false}}'

# 删除CronJob
kubectl delete cronjob <cronjob-name>

实践示例

示例1:数据库备份Job

yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: db-backup
spec:
  backoffLimit: 3
  activeDeadlineSeconds: 1800
  template:
    spec:
      containers:
      - name: backup
        image: postgres:13
        command:
        - /bin/sh
        - -c
        - |
          pg_dump -h postgres-service -U postgres mydb | gzip > /backup/db_$(date +%Y%m%d_%H%M%S).sql.gz
        env:
        - name: PGPASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: backup
          mountPath: /backup
      volumes:
      - name: backup
        persistentVolumeClaim:
          claimName: backup-pvc
      restartPolicy: OnFailure

示例2:数据迁移Job

yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration
spec:
  completions: 1
  parallelism: 1
  backoffLimit: 0
  template:
    spec:
      containers:
      - name: migration
        image: migration-tool:v1.0
        command: ["python", "migrate.py"]
        env:
        - name: SOURCE_DB
          value: "mysql-old"
        - name: TARGET_DB
          value: "mysql-new"
        - name: BATCH_SIZE
          value: "1000"
      restartPolicy: Never

示例3:批量处理Job

yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: batch-process
spec:
  completions: 10
  parallelism: 3
  template:
    spec:
      containers:
      - name: processor
        image: batch-processor:v1.0
        command: ["python", "process.py"]
        env:
        - name: TASK_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
      restartPolicy: OnFailure

示例4:定时清理CronJob

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: cleanup-job
spec:
  schedule: "0 3 * * *"  # 每天凌晨3点
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cleanup
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              kubectl delete pods --field-selector=status.phase=Failed -n production
              kubectl delete pods --field-selector=status.phase=Succeeded -n production
              kubectl delete jobs --field-selector=status.successful=1 -n production
          serviceAccountName: cleanup-sa
          restartPolicy: OnFailure

示例5:报表生成CronJob

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: report-generator
spec:
  schedule: "0 8 * * 1-5"  # 工作日早上8点
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 3600
      template:
        spec:
          containers:
          - name: report
            image: report-generator:v1.0
            command: ["python", "generate_report.py"]
            env:
            - name: REPORT_TYPE
              value: "daily"
            - name: OUTPUT_PATH
              value: "/reports"
            volumeMounts:
            - name: reports
              mountPath: /reports
          volumes:
          - name: reports
            persistentVolumeClaim:
              claimName: reports-pvc
          restartPolicy: OnFailure

示例6:健康检查CronJob

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: health-check
spec:
  schedule: "*/5 * * * *"  # 每5分钟
  concurrencyPolicy: Allow
  successfulJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: health-check
            image: curlimages/curl:latest
            command:
            - /bin/sh
            - -c
            - |
              curl -f http://api-service/health || exit 1
              curl -f http://web-service/health || exit 1
          restartPolicy: OnFailure

故障排查

常见问题

1. Job失败

bash
# 查看Job状态
kubectl describe job <job-name>

# 查看Pod状态
kubectl get pods -l job-name=<job-name>

# 查看Pod日志
kubectl logs <pod-name>

# 查看Pod事件
kubectl describe pod <pod-name>

2. CronJob未执行

bash
# 检查CronJob状态
kubectl describe cronjob <cronjob-name>

# 查看是否暂停
kubectl get cronjob <cronjob-name> -o yaml | grep suspend

# 检查调度时间
kubectl get cronjob <cronjob-name> -o yaml | grep lastScheduleTime

3. Job卡住

bash
# 查看Job状态
kubectl get job <job-name> -o yaml

# 删除卡住的Job
kubectl delete job <job-name>

# 检查资源配额
kubectl describe resourcequota

4. 并发问题

bash
# 查看正在运行的Job
kubectl get jobs -l cronjob-name=<cronjob-name>

# 检查并发策略
kubectl get cronjob <cronjob-name> -o yaml | grep concurrencyPolicy

# 手动清理Job
kubectl delete job <job-name>

最佳实践

1. 资源管理

  • 设置合理的资源请求和限制
  • 使用activeDeadlineSeconds限制运行时间
  • 监控资源使用情况
  • 避免资源竞争

2. 错误处理

  • 设置合适的backoffLimit
  • 实现幂等性操作
  • 记录详细日志
  • 配置告警通知

3. 并发控制

  • 根据任务特性选择并发策略
  • 避免任务重叠
  • 控制并行度
  • 监控并发执行

4. 数据管理

  • 使用持久化存储
  • 实现数据备份
  • 清理临时数据
  • 监控存储使用

5. 监控告警

  • 监控Job执行状态
  • 监控执行时间
  • 设置失败告警
  • 记录执行历史

总结

Job和CronJob是Kubernetes中处理批处理任务和定时任务的核心控制器。掌握它们的使用对于实现数据备份、报表生成、定时清理等自动化任务至关重要。

下一步学习