Job与CronJob
概述
Job是Kubernetes中用于运行一次性任务的控制器,CronJob则用于定时执行Job。它们适合批处理任务、数据备份、报表生成等场景。
Job核心概念
1. Job类型
- 一次性Job: 运行一次直到完成
- 固定完成次数Job: 运行指定次数
- 工作队列Job: 并行处理工作队列
2. Job特性
- 任务完成后Pod不删除
- 支持任务重试
- 支持并行执行
- 可设置超时时间
3. Pod生命周期
- Pod成功完成:Job标记为完成
- Pod失败:Job创建新Pod重试
- 达到重试次数:Job标记为失败
Job配置详解
基本Job配置
yaml
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4固定完成次数Job
yaml
apiVersion: batch/v1
kind: Job
metadata:
name: parallel-job
spec:
completions: 5 # 需要5个Pod成功完成
parallelism: 2 # 同时运行2个Pod
backoffLimit: 10 # 最大重试次数
activeDeadlineSeconds: 600 # 最大运行时间(秒)
template:
spec:
containers:
- name: worker
image: busybox
command: ["echo", "Task completed"]
restartPolicy: OnFailure工作队列Job
yaml
apiVersion: batch/v1
kind: Job
metadata:
name: work-queue-job
spec:
parallelism: 3 # 3个并行Pod
template:
spec:
containers:
- name: worker
image: myworker:latest
command: ["python", "worker.py"]
restartPolicy: OnFailureCronJob配置详解
基本CronJob配置
yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *" # 每分钟执行
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
command: ["echo", "Hello from CronJob"]
restartPolicy: OnFailure完整CronJob配置
yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: backup-job
namespace: production
labels:
app: backup
spec:
schedule: "0 2 * * *" # 每天凌晨2点执行
concurrencyPolicy: Forbid # 禁止并发执行
successfulJobsHistoryLimit: 3 # 保留3个成功Job
failedJobsHistoryLimit: 1 # 保留1个失败Job
startingDeadlineSeconds: 600 # 启动截止时间
suspend: false # 是否暂停
jobTemplate:
metadata:
labels:
app: backup
spec:
backoffLimit: 3
activeDeadlineSeconds: 3600 # 最大运行1小时
template:
spec:
containers:
- name: backup
image: backup-tool:v1.0
command: ["/bin/sh", "-c"]
args:
- |
echo "Starting backup at $(date)"
pg_dump -h $DB_HOST -U $DB_USER $DB_NAME > /backup/backup_$(date +%Y%m%d).sql
echo "Backup completed"
env:
- name: DB_HOST
value: "postgres-service"
- name: DB_USER
valueFrom:
secretKeyRef:
name: db-secret
key: username
- name: DB_NAME
value: "mydb"
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailureCron表达式
标准格式
┌───────────── 分钟 (0 - 59)
│ ┌───────────── 小时 (0 - 23)
│ │ ┌───────────── 日 (1 - 31)
│ │ │ ┌───────────── 月 (1 - 12)
│ │ │ │ ┌───────────── 周几 (0 - 6) (周日到周六)
│ │ │ │ │
* * * * *常用表达式
bash
# 每分钟执行
* * * * *
# 每小时执行
0 * * * *
# 每天凌晨2点执行
0 2 * * *
# 每周一凌晨3点执行
0 3 * * 1
# 每月1号凌晨4点执行
0 4 1 * *
# 每15分钟执行
*/15 * * * *
# 工作日上午9点执行
0 9 * * 1-5并发策略
1. Allow(默认)
yaml
concurrencyPolicy: Allow # 允许并发执行2. Forbid
yaml
concurrencyPolicy: Forbid # 禁止并发,跳过新任务3. Replace
yaml
concurrencyPolicy: Replace # 替换正在运行的任务操作命令
Job操作
bash
# 创建Job
kubectl apply -f job.yaml
kubectl create job --image=busybox my-job -- echo "Hello"
# 查看Job
kubectl get jobs
kubectl describe job <job-name>
# 查看Job日志
kubectl logs job/<job-name>
# 删除Job
kubectl delete job <job-name>CronJob操作
bash
# 创建CronJob
kubectl apply -f cronjob.yaml
kubectl create cronjob my-cron --image=busybox --schedule="*/1 * * * *" -- echo "Hello"
# 查看CronJob
kubectl get cronjobs
kubectl describe cronjob <cronjob-name>
# 手动触发CronJob
kubectl create job --from=cronjob/<cronjob-name> manual-job
# 暂停CronJob
kubectl patch cronjob <cronjob-name> -p '{"spec":{"suspend":true}}'
# 恢复CronJob
kubectl patch cronjob <cronjob-name> -p '{"spec":{"suspend":false}}'
# 删除CronJob
kubectl delete cronjob <cronjob-name>实践示例
示例1:数据库备份Job
yaml
apiVersion: batch/v1
kind: Job
metadata:
name: db-backup
spec:
backoffLimit: 3
activeDeadlineSeconds: 1800
template:
spec:
containers:
- name: backup
image: postgres:13
command:
- /bin/sh
- -c
- |
pg_dump -h postgres-service -U postgres mydb | gzip > /backup/db_$(date +%Y%m%d_%H%M%S).sql.gz
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: backup
mountPath: /backup
volumes:
- name: backup
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailure示例2:数据迁移Job
yaml
apiVersion: batch/v1
kind: Job
metadata:
name: data-migration
spec:
completions: 1
parallelism: 1
backoffLimit: 0
template:
spec:
containers:
- name: migration
image: migration-tool:v1.0
command: ["python", "migrate.py"]
env:
- name: SOURCE_DB
value: "mysql-old"
- name: TARGET_DB
value: "mysql-new"
- name: BATCH_SIZE
value: "1000"
restartPolicy: Never示例3:批量处理Job
yaml
apiVersion: batch/v1
kind: Job
metadata:
name: batch-process
spec:
completions: 10
parallelism: 3
template:
spec:
containers:
- name: processor
image: batch-processor:v1.0
command: ["python", "process.py"]
env:
- name: TASK_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
restartPolicy: OnFailure示例4:定时清理CronJob
yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: cleanup-job
spec:
schedule: "0 3 * * *" # 每天凌晨3点
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: cleanup
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl delete pods --field-selector=status.phase=Failed -n production
kubectl delete pods --field-selector=status.phase=Succeeded -n production
kubectl delete jobs --field-selector=status.successful=1 -n production
serviceAccountName: cleanup-sa
restartPolicy: OnFailure示例5:报表生成CronJob
yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: report-generator
spec:
schedule: "0 8 * * 1-5" # 工作日早上8点
concurrencyPolicy: Forbid
jobTemplate:
spec:
backoffLimit: 2
activeDeadlineSeconds: 3600
template:
spec:
containers:
- name: report
image: report-generator:v1.0
command: ["python", "generate_report.py"]
env:
- name: REPORT_TYPE
value: "daily"
- name: OUTPUT_PATH
value: "/reports"
volumeMounts:
- name: reports
mountPath: /reports
volumes:
- name: reports
persistentVolumeClaim:
claimName: reports-pvc
restartPolicy: OnFailure示例6:健康检查CronJob
yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: health-check
spec:
schedule: "*/5 * * * *" # 每5分钟
concurrencyPolicy: Allow
successfulJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
containers:
- name: health-check
image: curlimages/curl:latest
command:
- /bin/sh
- -c
- |
curl -f http://api-service/health || exit 1
curl -f http://web-service/health || exit 1
restartPolicy: OnFailure故障排查
常见问题
1. Job失败
bash
# 查看Job状态
kubectl describe job <job-name>
# 查看Pod状态
kubectl get pods -l job-name=<job-name>
# 查看Pod日志
kubectl logs <pod-name>
# 查看Pod事件
kubectl describe pod <pod-name>2. CronJob未执行
bash
# 检查CronJob状态
kubectl describe cronjob <cronjob-name>
# 查看是否暂停
kubectl get cronjob <cronjob-name> -o yaml | grep suspend
# 检查调度时间
kubectl get cronjob <cronjob-name> -o yaml | grep lastScheduleTime3. Job卡住
bash
# 查看Job状态
kubectl get job <job-name> -o yaml
# 删除卡住的Job
kubectl delete job <job-name>
# 检查资源配额
kubectl describe resourcequota4. 并发问题
bash
# 查看正在运行的Job
kubectl get jobs -l cronjob-name=<cronjob-name>
# 检查并发策略
kubectl get cronjob <cronjob-name> -o yaml | grep concurrencyPolicy
# 手动清理Job
kubectl delete job <job-name>最佳实践
1. 资源管理
- 设置合理的资源请求和限制
- 使用activeDeadlineSeconds限制运行时间
- 监控资源使用情况
- 避免资源竞争
2. 错误处理
- 设置合适的backoffLimit
- 实现幂等性操作
- 记录详细日志
- 配置告警通知
3. 并发控制
- 根据任务特性选择并发策略
- 避免任务重叠
- 控制并行度
- 监控并发执行
4. 数据管理
- 使用持久化存储
- 实现数据备份
- 清理临时数据
- 监控存储使用
5. 监控告警
- 监控Job执行状态
- 监控执行时间
- 设置失败告警
- 记录执行历史
总结
Job和CronJob是Kubernetes中处理批处理任务和定时任务的核心控制器。掌握它们的使用对于实现数据备份、报表生成、定时清理等自动化任务至关重要。