日志管理
概述
Kubernetes日志管理是运维工作中的重要环节,通过有效的日志收集、存储和分析,可以快速定位问题、监控系统状态并优化应用性能。本章将深入介绍Kubernetes日志管理体系,包括日志收集、日志分析和日志存储的最佳实践。
Kubernetes日志架构
日志类型
1. 容器日志
容器标准输出和标准错误日志,由容器运行时管理。
2. 应用日志
应用程序自身产生的日志文件,通常写入容器内部文件系统。
3. 系统日志
Kubernetes组件日志,如kubelet、kube-proxy等。
4. 审计日志
Kubernetes API Server审计日志,记录API操作。
日志架构图
┌─────────────────────────────────────────────────────────┐
│ 应用容器 │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ stdout │ │ stderr │ │ log file │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘
↓ ↓ ↓
┌─────────────────────────────────────────────────────────┐
│ 日志收集代理(DaemonSet) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Fluentd │ │ Fluentbit│ │ Filebeat │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘
↓ ↓ ↓
┌─────────────────────────────────────────────────────────┐
│ 日志存储系统 │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Loki │ │Elasticsearch│ │ Kafka │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘
↓ ↓ ↓
┌─────────────────────────────────────────────────────────┐
│ 日志分析工具 │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Grafana │ │ Kibana │ │ Loki │ │
│ │ Loki │ │ │ │ Query │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘日志收集方案
方案一:Fluentd + Elasticsearch
1. 部署Elasticsearch
yaml
# elasticsearch.yaml
apiVersion: v1
kind: Namespace
metadata:
name: logging
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
namespace: logging
labels:
app: elasticsearch
spec:
serviceName: elasticsearch
replicas: 1
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:8.8.0
ports:
- containerPort: 9200
name: rest
- containerPort: 9300
name: inter-node
env:
- name: cluster.name
value: k8s-logs
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: discovery.type
value: single-node
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx512m"
- name: xpack.security.enabled
value: "false"
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
volumes:
- name: data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch
namespace: logging
spec:
selector:
app: elasticsearch
ports:
- port: 9200
name: rest
- port: 9300
name: inter-node2. 部署Fluentd
yaml
# fluentd.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd
namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluentd
rules:
- apiGroups: [""]
resources:
- pods
- namespaces
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluentd
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluentd
subjects:
- kind: ServiceAccount
name: fluentd
namespace: logging
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
namespace: logging
data:
fluent.conf: |
<source>
@type tail
@id in_tail_container_logs
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<filter kubernetes.**>
@type kubernetes_metadata
@id filter_kube_metadata
kubernetes_url "#{ENV['KUBERNETES_SERVICE_HOST']}:#{ENV['KUBERNETES_SERVICE_PORT']}"
</filter>
<match kubernetes.**>
@type elasticsearch
@id out_es
@log_level info
include_tag_key true
host elasticsearch
port 9200
logstash_format true
logstash_prefix k8s-logs
<buffer>
@type file
path /var/log/fluentd/buffer/kubernetes
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever
retry_max_interval 30
chunk_limit_size 2M
queue_limit_length 8
overflow_action block
</buffer>
</match>
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: logging
labels:
app: fluentd
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
serviceAccountName: fluentd
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
- name: FLUENT_ELASTICSEARCH_SCHEME
value: "http"
resources:
limits:
memory: 500Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: config
mountPath: /fluentd/etc/fluent.conf
subPath: fluent.conf
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: config
configMap:
name: fluentd-config3. 部署Kibana
yaml
# kibana.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
namespace: logging
spec:
replicas: 1
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
containers:
- name: kibana
image: docker.elastic.co/kibana/kibana:8.8.0
ports:
- containerPort: 5601
env:
- name: ELASTICSEARCH_URL
value: http://elasticsearch:9200
- name: SERVER_NAME
value: kibana
- name: SERVER_HOST
value: "0.0.0.0"
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
name: kibana
namespace: logging
spec:
type: NodePort
ports:
- port: 5601
targetPort: 5601
nodePort: 30601
selector:
app: kibana部署命令
bash
kubectl apply -f elasticsearch.yaml
kubectl apply -f fluentd.yaml
kubectl apply -f kibana.yaml
kubectl get pods -n logging
kubectl get svc -n logging
kubectl port-forward -n logging svc/kibana 5601:5601方案二:Promtail + Loki
1. 部署Loki
yaml
# loki.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-config
namespace: logging
data:
loki.yaml: |
auth_enabled: false
ingester:
chunk_idle_period: 3m
chunk_block_size: 262144
chunk_retain_period: 1m
max_transfer_retries: 0
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/boltdb-shipper-active
cache_location: /tmp/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /tmp/loki/chunks
compactor:
working_directory: /tmp/loki/boltdb-shipper-compactor
shared_store: filesystem
server:
http_listen_port: 3100
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: loki
namespace: logging
spec:
replicas: 1
selector:
matchLabels:
app: loki
template:
metadata:
labels:
app: loki
spec:
containers:
- name: loki
image: grafana/loki:2.8.0
args:
- -config.file=/etc/loki/loki.yaml
ports:
- containerPort: 3100
name: http
volumeMounts:
- name: config
mountPath: /etc/loki
- name: storage
mountPath: /tmp/loki
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
volumes:
- name: config
configMap:
name: loki-config
- name: storage
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: loki
namespace: logging
spec:
ports:
- port: 3100
targetPort: http
name: http
selector:
app: loki2. 部署Promtail
yaml
# promtail.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: promtail-config
namespace: logging
data:
promtail.yaml: |
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
pipeline_stages:
- docker: {}
- match:
selector: '{app="nginx"}'
stages:
- regex:
expression: '^(?P<remote_addr>[\d\.]+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<request>[^"]+)" (?P<status>\d+) (?P<body_bytes_sent>\d+)'
- labels:
remote_addr:
status:
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: promtail
namespace: logging
spec:
selector:
matchLabels:
app: promtail
template:
metadata:
labels:
app: promtail
spec:
containers:
- name: promtail
image: grafana/promtail:2.8.0
args:
- -config.file=/etc/promtail/promtail.yaml
env:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: config
mountPath: /etc/promtail
- name: varlog
mountPath: /var/log
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
volumes:
- name: config
configMap:
name: promtail-config
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers部署命令
bash
kubectl apply -f loki.yaml
kubectl apply -f promtail.yaml
kubectl get pods -n logging
kubectl get svc -n logging
kubectl port-forward -n logging svc/loki 3100:3100日志分析实践
示例1:应用日志收集配置
yaml
# app-with-logging.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-logging-demo
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: nginx-logging
template:
metadata:
labels:
app: nginx-logging
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9113"
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
- name: log-volume
mountPath: /var/log/nginx
- name: log-exporter
image: prom/nginxlog-exporter:latest
args:
- -listen-address=:9113
- -nginx-log-path=/var/log/nginx/access.log
volumeMounts:
- name: log-volume
mountPath: /var/log/nginx
readOnly: true
volumes:
- name: nginx-config
configMap:
name: nginx-config
- name: log-volume
emptyDir: {}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-config
namespace: default
data:
nginx.conf: |
user nginx;
worker_processes 1;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
keepalive_timeout 65;
server {
listen 80;
server_name localhost;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
}
}
}
---
apiVersion: v1
kind: Service
metadata:
name: nginx-logging
namespace: default
spec:
selector:
app: nginx-logging
ports:
- port: 80
targetPort: 80
name: http
- port: 9113
targetPort: 9113
name: metrics示例2:日志聚合分析
yaml
# log-aggregation.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: log-aggregator
namespace: logging
data:
aggregate.py: |
import json
import sys
from collections import Counter
from datetime import datetime
def parse_log_line(line):
try:
return json.loads(line)
except:
return None
def aggregate_logs(log_file):
error_counter = Counter()
status_counter = Counter()
time_series = {}
with open(log_file, 'r') as f:
for line in f:
log = parse_log_line(line)
if not log:
continue
if 'status' in log:
status_counter[log['status']] += 1
if 'level' in log:
error_counter[log['level']] += 1
if 'timestamp' in log:
hour = datetime.fromisoformat(log['timestamp']).hour
time_series[hour] = time_series.get(hour, 0) + 1
print("Status Code Distribution:")
for status, count in status_counter.most_common():
print(f" {status}: {count}")
print("\nLog Level Distribution:")
for level, count in error_counter.most_common():
print(f" {level}: {count}")
print("\nRequests per Hour:")
for hour in sorted(time_series.keys()):
print(f" {hour}:00 - {time_series[hour]} requests")
if __name__ == '__main__':
aggregate_logs(sys.argv[1])
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: log-aggregator
namespace: logging
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: aggregator
image: python:3.9-slim
command:
- python
- /scripts/aggregate.py
- /var/log/nginx/access.log
volumeMounts:
- name: script
mountPath: /scripts
- name: logs
mountPath: /var/log/nginx
readOnly: true
volumes:
- name: script
configMap:
name: log-aggregator
- name: logs
emptyDir: {}
restartPolicy: OnFailure示例3:日志告警规则
yaml
# log-alerts.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: log-alert-rules
namespace: logging
data:
alerts.yaml: |
groups:
- name: log-alerts
rules:
- alert: HighErrorRate
expr: |
sum(rate({app="nginx"} |= "error" [5m]))
/
sum(rate({app="nginx"} [5m])) * 100 > 5
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }}%"
- alert: LogVolumeHigh
expr: |
sum(rate({namespace="default"} [5m])) > 1000
for: 10m
labels:
severity: warning
annotations:
summary: "High log volume detected"
description: "Log volume is {{ $value }} logs/s"
- alert: ApplicationCrash
expr: |
count_over_time({app="nginx"} |= "panic" [5m]) > 0
for: 1m
labels:
severity: critical
annotations:
summary: "Application crash detected"
description: "Found panic in logs"kubectl日志操作命令
基本日志查看
bash
kubectl logs <pod-name>
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -c <container-name>
kubectl logs <pod-name> --previous
kubectl logs <pod-name> --tail=100
kubectl logs <pod-name> --since=1h
kubectl logs <pod-name> --timestamps
kubectl logs -f <pod-name>多容器日志
bash
kubectl logs <pod-name> --all-containers
kubectl logs <pod-name> -c container1 -c container2
kubectl logs <pod-name> --max-log-requests=5日志过滤
bash
kubectl logs <pod-name> | grep ERROR
kubectl logs <pod-name> | grep -E "ERROR|WARN"
kubectl logs <pod-name> | awk '/ERROR/ {print}'
kubectl logs <pod-name> | sed -n '/2024-01-15/,/2024-01-16/p'日志导出
bash
kubectl logs <pod-name> > pod.log
kubectl logs <pod-name> --since=24h > pod-24h.log
kubectl logs <pod-name> -n <namespace> > namespace-pod.log日志分析
bash
kubectl logs <pod-name> | wc -l
kubectl logs <pod-name> | grep ERROR | wc -l
kubectl logs <pod-name> | awk '{print $1}' | sort | uniq -c | sort -nr
kubectl logs <pod-name> | grep -oP '"status":\K\d+' | sort | uniq -c故障排查指南
问题1:日志收集失败
症状
bash
kubectl logs -n logging -l app=fluentd
[error]: [out_es] failed to flush the buffer排查步骤
bash
kubectl get pods -n logging -l app=fluentd
kubectl logs -n logging <fluentd-pod>
kubectl describe pod -n logging <fluentd-pod>
kubectl exec -n logging <fluentd-pod> -- ls -la /var/log/containers
kubectl exec -n logging <fluentd-pod> -- cat /var/log/fluentd-containers.log.pos解决方案
- 检查Fluentd配置文件
- 验证Elasticsearch连接
- 检查存储权限
- 查看资源限制
问题2:日志丢失
症状
- 部分日志未收集
- 日志顺序错乱
排查步骤
bash
kubectl exec -n logging <fluentd-pod> -- cat /var/log/fluentd-containers.log.pos
kubectl exec -n logging <fluentd-pod> -- ls -la /var/log/containers
kubectl logs <app-pod> | wc -l
kubectl exec -n logging <fluentd-pod> -- cat /var/log/fluentd/buffer/kubernetes/*.log解决方案
- 增加Fluentd buffer大小
- 调整flush间隔
- 检查磁盘空间
- 验证日志格式
问题3:Elasticsearch性能问题
症状
bash
kubectl logs -n logging <elasticsearch-pod>
[o.e.m.j.JvmGcMonitorService] [node-1] [gc][old] allocation, failure排查步骤
bash
kubectl top pods -n logging
kubectl exec -n logging <elasticsearch-pod> -- curl -X GET "localhost:9200/_cat/indices?v"
kubectl exec -n logging <elasticsearch-pod> -- curl -X GET "localhost:9200/_cluster/health?pretty"
kubectl describe pod -n logging <elasticsearch-pod>解决方案
yaml
env:
- name: ES_JAVA_OPTS
value: "-Xms2g -Xmx2g"
resources:
requests:
cpu: 1000m
memory: 4Gi
limits:
cpu: 2000m
memory: 8Gi问题4:日志查询慢
症状
- Kibana查询超时
- Loki查询缓慢
排查步骤
bash
kubectl top pods -n logging
kubectl exec -n logging <loki-pod> -- df -h
kubectl logs -n logging <loki-pod> | grep slow
kubectl exec -n logging <loki-pod> -- curl http://localhost:3100/metrics解决方案
- 优化索引配置
- 增加缓存
- 调整查询时间范围
- 使用标签过滤
问题5:存储空间不足
症状
bash
kubectl logs -n logging <elasticsearch-pod>
[error] no space left on device排查步骤
bash
kubectl get pvc -n logging
kubectl describe pvc -n logging
kubectl exec -n logging <elasticsearch-pod> -- df -h
kubectl exec -n logging <elasticsearch-pod> -- curl -X GET "localhost:9200/_cat/indices?v&h=index,store.size,docs.count"解决方案
- 配置索引生命周期管理(ILM)
- 删除旧索引
- 扩展存储容量
- 配置日志轮转
最佳实践
1. 日志格式最佳实践
结构化日志
json
{
"timestamp": "2024-01-15T10:30:00Z",
"level": "INFO",
"message": "Request processed successfully",
"service": "user-service",
"trace_id": "abc123",
"user_id": "user001",
"duration_ms": 150,
"status": 200
}日志级别规范
yaml
日志级别:
- DEBUG: 详细调试信息
- INFO: 常规信息
- WARN: 警告信息
- ERROR: 错误信息
- FATAL: 致命错误2. 日志收集最佳实践
资源配置
yaml
resources:
requests:
cpu: 100m
memory: 200Mi
limits:
cpu: 500m
memory: 500Mi缓冲配置
yaml
<buffer>
@type file
path /var/log/fluentd/buffer
flush_mode interval
flush_interval 5s
retry_type exponential_backoff
retry_max_interval 30
chunk_limit_size 2M
queue_limit_length 8
</buffer>3. 日志存储最佳实践
索引生命周期管理
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: ilm-policy
namespace: logging
data:
ilm-policy.json: |
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "50GB",
"max_age": "1d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"forcemerge": {
"max_num_segments": 1
},
"shrink": {
"number_of_shards": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}数据保留策略
yaml
保留策略:
- 热数据: 7天(SSD存储)
- 温数据: 30天(HDD存储)
- 冷数据: 90天(归档存储)
- 删除: 超过90天4. 日志查询最佳实践
Loki查询示例
promql
{app="nginx"} |= "error"
{namespace="default"} |~ "error|warn"
{app="nginx"} | json | status >= 500
sum by (status) (count_over_time({app="nginx"} | json [1h]))Elasticsearch查询示例
json
{
"query": {
"bool": {
"must": [
{"match": {"level": "ERROR"}},
{"range": {"timestamp": {"gte": "now-1h"}}}
]
}
}
}5. 安全最佳实践
日志脱敏
yaml
<filter kubernetes.**>
@type record_transformer
enable_ruby true
<record>
message ${record["message"].gsub(/password=\S+/, 'password=***')}
</record>
</filter>访问控制
yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: log-reader
rules:
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get"]网络策略
yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: logging-network-policy
namespace: logging
spec:
podSelector:
matchLabels:
app: elasticsearch
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: logging
ports:
- protocol: TCP
port: 9200日志分析工具
Loki查询语法
基本查询
promql
{app="nginx"}
{namespace="default", app="nginx"}
{app=~"nginx-.*"}日志过滤
promql
{app="nginx"} |= "error"
{app="nginx"} != "debug"
{app="nginx"} |~ "error|warn"JSON解析
promql
{app="nginx"} | json
{app="nginx"} | json | level="error"
{app="nginx"} | json | status >= 500聚合查询
promql
count_over_time({app="nginx"}[1h])
rate({app="nginx"}[5m])
sum by (status) (count_over_time({app="nginx"} | json [1h]))Kibana可视化
创建索引模式
1. Kibana -> Management -> Index Patterns
2. Create index pattern: k8s-logs-*
3. Select time field: @timestamp创建Dashboard
1. Kibana -> Dashboard -> Create dashboard
2. Add visualization
3. Select index pattern
4. Configure aggregation
5. Save dashboard总结
本章详细介绍了Kubernetes日志管理的核心概念和实践方法:
- 日志架构: 理解了Kubernetes日志类型和架构设计
- 日志收集: 学会了Fluentd和Promtail两种主流方案
- 日志存储: 掌握了Elasticsearch和Loki的部署配置
- 日志分析: 理解了日志查询和分析的方法
- 实践应用: 通过实际案例掌握了完整的日志方案
- 故障排查: 掌握了常见问题的诊断和解决方法
日志管理是Kubernetes运维的重要环节,为问题定位和系统优化提供了重要支持。
下一步学习
- 告警管理 - 配置AlertManager告警系统
- Prometheus监控 - 深入学习Prometheus监控系统
- Grafana可视化 - 掌握Grafana可视化面板设计
- 指标基础 - 回顾K8S监控体系基础知识