服务发现
概述
服务发现是Kubernetes中自动检测和连接服务的机制。它允许应用程序动态地找到彼此,而无需硬编码IP地址或端口。Kubernetes提供了两种主要的服务发现机制:DNS服务发现和环境变量。
服务发现核心功能
1. DNS服务发现
- 自动域名解析
- 支持服务名访问
- 跨命名空间访问
- CoreDNS支持
2. 环境变量
- 自动注入服务信息
- 包含服务地址和端口
- 简单易用
- 启动时注入
3. 服务注册与发现
- 自动服务注册
- 动态更新端点
- 健康检查集成
- 负载均衡支持
服务发现架构
┌─────────────────────────────────────────────────────────┐
│ Pod (客户端) │
│ ┌──────────────────────────────────────────────────┐ │
│ │ 应用程序 │ │
│ │ - 使用DNS: http://api-service:8080 │ │
│ │ - 使用环境变量: $API_SERVICE_SERVICE_HOST │ │
│ └──────────────────────────────────────────────────┘ │
└────────────────────┬────────────────────────────────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌───────────────┐ ┌──────────────┐
│ CoreDNS │ │ 环境变量注入 │
│ (DNS解析) │ │ (启动时) │
└───────┬───────┘ └──────┬───────┘
│ │
│ │
└────────┬───────────────┘
│
▼
┌────────────────┐
│ Service │
│ (ClusterIP) │
└────────┬───────┘
│
▼
┌────────────────┐
│ Endpoints │
│ (Pod IP列表) │
└────────┬───────┘
│
┌────────┼────────┐
│ │ │
▼ ▼ ▼
┌──────┐ ┌──────┐ ┌──────┐
│ Pod1 │ │ Pod2 │ │ Pod3 │
└──────┘ └──────┘ └──────┘DNS服务发现
DNS命名规范
完整域名格式
<service-name>.<namespace>.svc.<cluster-domain>示例
bash
# 默认集群域名
api-service.default.svc.cluster.local
# 自定义命名空间
user-service.production.svc.cluster.local
# 简写形式(同一命名空间)
api-service
# 跨命名空间访问
user-service.productionDNS记录类型
1. Service DNS记录
bash
# ClusterIP Service
api-service.default.svc.cluster.local. 30 IN A 10.96.0.1
# Headless Service (返回所有Pod IP)
headless-service.default.svc.cluster.local. 30 IN A 10.244.1.2
headless-service.default.svc.cluster.local. 30 IN A 10.244.2.3
# ExternalName Service
external-service.default.svc.cluster.local. 30 IN CNAME external.example.com2. Pod DNS记录
bash
# Pod DNS记录(需要启用)
10-244-1-2.default.pod.cluster.local. 30 IN A 10.244.1.2CoreDNS配置
CoreDNS ConfigMap
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
example.com:53 {
errors
cache 30
forward . 192.168.1.1
}自定义DNS配置
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
example.server: |
example.com:53 {
errors
cache 30
forward . 8.8.8.8 8.8.4.4
}
stub.domain: |
10.in-addr.arpa:53 {
errors
cache 30
forward . 192.168.1.1
}CoreDNS部署配置
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/name: CoreDNS
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
spec:
serviceAccountName: coredns
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
- key: CriticalAddonsOnly
operator: Exists
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: kubernetes.io/hostname
containers:
- name: coredns
image: k8s.gcr.io/coredns/coredns:v1.10.1
imagePullPolicy: IfNotPresent
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
args:
- -conf
- /etc/coredns/Corefile
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
readOnly: true
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
dnsPolicy: Default
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: CoreDNS
annotations:
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.96.0.10
ports:
- name: dns
port: 53
protocol: UDP
targetPort: 53
- name: dns-tcp
port: 53
protocol: TCP
targetPort: 53
- name: metrics
port: 9153
protocol: TCP
targetPort: 9153环境变量服务发现
自动注入的环境变量
格式规范
bash
<ServiceName>_SERVICE_HOST=<ClusterIP>
<ServiceName>_SERVICE_PORT=<Port>
<ServiceName>_PORT=<Protocol>://<ClusterIP>:<Port>
<ServiceName>_PORT_<PortNumber>_<Protocol>=<Protocol>://<ClusterIP>:<Port>
<ServiceName>_PORT_<PortNumber>_<Protocol>_PROTO=<Protocol>
<ServiceName>_PORT_<PortNumber>_<Protocol>_PORT=<Port>
<ServiceName>_PORT_<PortNumber>_<Protocol>_ADDR=<ClusterIP>示例
bash
# Service定义
apiVersion: v1
kind: Service
metadata:
name: api-service
spec:
ports:
- port: 8080
protocol: TCP
targetPort: 8080
# 自动注入的环境变量
API_SERVICE_SERVICE_HOST=10.96.0.1
API_SERVICE_SERVICE_PORT=8080
API_SERVICE_PORT=tcp://10.96.0.1:8080
API_SERVICE_PORT_8080_TCP=tcp://10.96.0.1:8080
API_SERVICE_PORT_8080_TCP_PROTO=tcp
API_SERVICE_PORT_8080_TCP_PORT=8080
API_SERVICE_PORT_8080_TCP_ADDR=10.96.0.1多端口Service环境变量
bash
# Service定义
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
- name: https
port: 443
protocol: TCP
targetPort: 8443
# 自动注入的环境变量
WEB_SERVICE_SERVICE_HOST=10.96.0.2
WEB_SERVICE_SERVICE_PORT=80
WEB_SERVICE_PORT=tcp://10.96.0.2:80
WEB_SERVICE_PORT_80_TCP=tcp://10.96.0.2:80
WEB_SERVICE_PORT_80_TCP_PROTO=tcp
WEB_SERVICE_PORT_80_TCP_PORT=80
WEB_SERVICE_PORT_80_TCP_ADDR=10.96.0.2
WEB_SERVICE_PORT_443_TCP=tcp://10.96.0.2:443
WEB_SERVICE_PORT_443_TCP_PROTO=tcp
WEB_SERVICE_PORT_443_TCP_PORT=443
WEB_SERVICE_PORT_443_TCP_ADDR=10.96.0.2Pod DNS配置
自定义DNS配置
yaml
apiVersion: v1
kind: Pod
metadata:
name: custom-dns-pod
spec:
containers:
- name: app
image: nginx
dnsPolicy: None
dnsConfig:
nameservers:
- 192.168.1.1
- 8.8.8.8
searches:
- default.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: "5"
- name: timeout
value: "3"
- name: attempts
value: "3"DNS策略选项
yaml
apiVersion: v1
kind: Pod
metadata:
name: dns-policy-pod
spec:
containers:
- name: app
image: nginx
dnsPolicy: ClusterFirstDNS策略类型
- Default: 继承节点的DNS配置
- ClusterFirst: 优先使用集群DNS,然后使用节点DNS
- ClusterFirstWithHostNet: 主机网络模式下的集群DNS
- None: 自定义DNS配置
主机网络Pod的DNS配置
yaml
apiVersion: v1
kind: Pod
metadata:
name: host-network-pod
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: app
image: nginx操作命令
DNS查询和测试
bash
# 查看DNS服务
kubectl get svc -n kube-system kube-dns
# 查看CoreDNS Pod
kubectl get pods -n kube-system -l k8s-app=kube-dns
# 测试DNS解析
kubectl run -it --rm --restart=Never dns-test --image=busybox -- nslookup kubernetes
# 测试服务DNS解析
kubectl run -it --rm --restart=Never dns-test --image=busybox -- nslookup api-service
# 测试跨命名空间DNS解析
kubectl run -it --rm --restart=Never dns-test --image=busybox -- nslookup api-service.production
# 使用dig测试
kubectl run -it --rm --restart=Never dns-test --image=nicolaka/netshoot -- dig kubernetes.default.svc.cluster.local
# 查看DNS配置
kubectl run -it --rm --restart=Never dns-test --image=busybox -- cat /etc/resolv.conf查看环境变量
bash
# 查看Pod环境变量
kubectl exec <pod-name> -- env | grep SERVICE
# 查看特定环境变量
kubectl exec <pod-name> -- env | grep API_SERVICE
# 进入Pod查看
kubectl exec -it <pod-name> -- sh
env | grep SERVICECoreDNS管理
bash
# 查看CoreDNS配置
kubectl get configmap -n kube-system coredns -o yaml
# 编辑CoreDNS配置
kubectl edit configmap -n kube-system coredns
# 查看CoreDNS日志
kubectl logs -n kube-system -l k8s-app=kube-dns
# 重启CoreDNS
kubectl rollout restart deployment -n kube-system coredns
# 查看CoreDNS指标
kubectl port-forward -n kube-system svc/kube-dns 9153:9153
curl http://localhost:9153/metricsDNS缓存清理
bash
# 清理CoreDNS缓存(重启Pod)
kubectl delete pods -n kube-system -l k8s-app=kube-dns
# 强制DNS重新解析
kubectl exec -it <pod-name> -- nslookup -debug api-service实践示例
示例1:微服务间服务发现
yaml
# 用户服务
apiVersion: v1
kind: Service
metadata:
name: user-service
namespace: production
spec:
selector:
app: user-service
ports:
- port: 8080
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
containers:
- name: user-service
image: user-service:v1.0.0
ports:
- containerPort: 8080
env:
- name: DB_SERVICE
value: "mysql-service.production.svc.cluster.local"
- name: CACHE_SERVICE
value: "redis-service.production.svc.cluster.local"
---
# 订单服务 - 使用DNS发现用户服务
apiVersion: v1
kind: Service
metadata:
name: order-service
namespace: production
spec:
selector:
app: order-service
ports:
- port: 8080
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
containers:
- name: order-service
image: order-service:v1.0.0
ports:
- containerPort: 8080
env:
- name: USER_SERVICE_URL
value: "http://user-service.production.svc.cluster.local:8080"
- name: PRODUCT_SERVICE_URL
value: "http://product-service.production.svc.cluster.local:8080"示例2:使用环境变量的服务发现
yaml
apiVersion: v1
kind: Service
metadata:
name: api-gateway
namespace: default
spec:
selector:
app: api-gateway
ports:
- port: 80
targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: auth-service
namespace: default
spec:
selector:
app: auth-service
ports:
- port: 8080
targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: backend-service
namespace: default
spec:
selector:
app: backend-service
ports:
- port: 8080
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: api-gateway
template:
metadata:
labels:
app: api-gateway
spec:
containers:
- name: api-gateway
image: api-gateway:v1.0.0
ports:
- containerPort: 8080
env:
- name: AUTH_SERVICE_HOST
valueFrom:
fieldRef:
fieldPath: spec.nodeName
command:
- /bin/sh
- -c
- |
echo "Auth Service: $AUTH_SERVICE_SERVICE_HOST:$AUTH_SERVICE_SERVICE_PORT"
echo "Backend Service: $BACKEND_SERVICE_SERVICE_HOST:$BACKEND_SERVICE_SERVICE_PORT"
./start-gateway.sh示例3:Headless Service和直接Pod访问
yaml
# Headless Service for StatefulSet
apiVersion: v1
kind: Service
metadata:
name: mysql-headless
namespace: database
spec:
clusterIP: None
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: database
spec:
serviceName: mysql-headless
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
---
# 应用连接到特定的MySQL Pod
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
namespace: database
spec:
replicas: 2
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:v1.0.0
env:
- name: DB_HOST
value: "mysql-0.mysql-headless.database.svc.cluster.local"
- name: DB_REPLICA_HOST
value: "mysql-1.mysql-headless.database.svc.cluster.local"示例4:自定义DNS配置
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-dns-config
namespace: default
data:
resolv.conf: |
nameserver 10.96.0.10
nameserver 8.8.8.8
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5 timeout:3 attempts:3
---
apiVersion: v1
kind: Pod
metadata:
name: custom-dns-app
namespace: default
spec:
containers:
- name: app
image: nginx
volumeMounts:
- name: dns-config
mountPath: /etc/resolv.conf
subPath: resolv.conf
volumes:
- name: dns-config
configMap:
name: app-dns-config
---
# 或者使用dnsPolicy和dnsConfig
apiVersion: v1
kind: Pod
metadata:
name: dns-config-app
namespace: default
spec:
containers:
- name: app
image: nginx
dnsPolicy: None
dnsConfig:
nameservers:
- 10.96.0.10
- 8.8.8.8
searches:
- default.svc.cluster.local
- svc.cluster.local
- cluster.local
options:
- name: ndots
value: "5"
- name: timeout
value: "3"
- name: attempts
value: "3"示例5:跨命名空间服务发现
yaml
# 生产命名空间的共享服务
apiVersion: v1
kind: Service
metadata:
name: shared-database
namespace: shared-services
spec:
selector:
app: postgres
ports:
- port: 5432
targetPort: 5432
---
# 开发环境访问共享数据库
apiVersion: v1
kind: Pod
metadata:
name: dev-app
namespace: development
spec:
containers:
- name: app
image: myapp:v1.0.0
env:
- name: DB_HOST
value: "shared-database.shared-services.svc.cluster.local"
- name: DB_PORT
value: "5432"
---
# 测试环境访问共享数据库
apiVersion: v1
kind: Pod
metadata:
name: test-app
namespace: testing
spec:
containers:
- name: app
image: myapp:v1.0.0
env:
- name: DB_HOST
value: "shared-database.shared-services"
- name: DB_PORT
value: "5432"故障排查
常见问题
1. DNS解析失败
bash
# 检查CoreDNS状态
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns
# 检查DNS Service
kubectl get svc -n kube-system kube-dns
kubectl describe svc -n kube-system kube-dns
# 测试DNS解析
kubectl run -it --rm --restart=Never dns-test --image=busybox -- nslookup kubernetes
# 检查Pod DNS配置
kubectl exec <pod-name> -- cat /etc/resolv.conf
# 检查CoreDNS配置
kubectl get configmap -n kube-system coredns -o yaml
# 测试外部DNS解析
kubectl run -it --rm --restart=Never dns-test --image=busybox -- nslookup google.com2. 服务无法发现
bash
# 检查Service是否存在
kubectl get svc -n <namespace>
kubectl describe svc <service-name> -n <namespace>
# 检查Service标签选择器
kubectl get svc <service-name> -o yaml | grep selector -A 5
# 检查Endpoint
kubectl get endpoints <service-name> -n <namespace>
kubectl describe endpoints <service-name> -n <namespace>
# 检查Pod标签
kubectl get pods -n <namespace> --show-labels
# 验证标签匹配
kubectl get pods -n <namespace> -l <label-key>=<label-value>
# 测试服务连接
kubectl run -it --rm --restart=Never test --image=busybox -- wget -O- http://<service-name>:<port>3. 环境变量未注入
bash
# 检查Pod创建时间(必须在Service之后)
kubectl get pods -o wide
# 检查Service创建时间
kubectl get svc
# 查看Pod环境变量
kubectl exec <pod-name> -- env | grep SERVICE
# 重启Pod以获取环境变量
kubectl delete pod <pod-name>
# 检查Service是否存在
kubectl get svc <service-name>4. 跨命名空间访问失败
bash
# 检查命名空间
kubectl get namespaces
# 检查Service在正确的命名空间
kubectl get svc -n <namespace>
# 使用完整域名测试
kubectl run -it --rm --restart=Never test --image=busybox -- nslookup <service-name>.<namespace>.svc.cluster.local
# 检查网络策略
kubectl get networkpolicy -n <namespace>
# 测试跨命名空间连接
kubectl run -it --rm --restart=Never test --image=busybox -- wget -O- http://<service-name>.<namespace>:<port>5. CoreDNS性能问题
bash
# 查看CoreDNS资源使用
kubectl top pods -n kube-system -l k8s-app=kube-dns
# 查看CoreDNS副本数
kubectl get deployment -n kube-system coredns
# 扩容CoreDNS
kubectl scale deployment -n kube-system coredns --replicas=3
# 查看CoreDNS指标
kubectl port-forward -n kube-system svc/kube-dns 9153:9153
curl http://localhost:9153/metrics | grep coredns
# 检查DNS查询缓存
kubectl exec -n kube-system <coredns-pod> -- cat /etc/coredns/Corefile | grep cache6. DNS超时问题
bash
# 检查DNS超时配置
kubectl exec <pod-name> -- cat /etc/resolv.conf
# 测试DNS解析时间
kubectl run -it --rm --restart=Never test --image=nicolaka/netshoot -- time nslookup kubernetes
# 调整DNS超时配置
# 在Pod中添加dnsConfig
# options:
# - name: timeout
# value: "5"
# - name: attempts
# value: "3"
# 检查CoreDNS日志
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100最佳实践
1. DNS使用建议
- 使用简短的服务名(同一命名空间)
- 跨命名空间使用完整域名
- 避免硬编码IP地址
- 使用有意义的Service名称
2. 环境变量使用建议
- Service创建必须在Pod之前
- 使用ConfigMap管理配置
- 避免依赖环境变量顺序
- 文档化环境变量用途
3. CoreDNS配置建议
- 部署多个副本实现高可用
- 配置合理的缓存时间
- 监控DNS查询性能
- 定期备份CoreDNS配置
4. 性能优化
- 启用DNS缓存
- 调整ndots值
- 使用本地DNS缓存
- 监控DNS查询延迟
5. 安全建议
- 限制DNS查询范围
- 使用网络策略保护DNS
- 监控异常DNS查询
- 定期审计DNS配置
6. 监控和告警
- 监控CoreDNS健康状态
- 监控DNS查询延迟
- 监控DNS错误率
- 设置合理的告警阈值
7. 故障恢复
- 配置CoreDNS自动扩缩容
- 准备DNS故障应急预案
- 定期演练DNS故障恢复
- 记录常见问题和解决方案
总结
服务发现是Kubernetes微服务架构的基础设施,DNS服务发现和环境变量提供了两种互补的服务发现机制。理解服务发现的工作原理和最佳实践,对于构建可靠的微服务系统至关重要。