Skip to content

服务发现

概述

服务发现是Kubernetes中自动检测和连接服务的机制。它允许应用程序动态地找到彼此,而无需硬编码IP地址或端口。Kubernetes提供了两种主要的服务发现机制:DNS服务发现和环境变量。

服务发现核心功能

1. DNS服务发现

  • 自动域名解析
  • 支持服务名访问
  • 跨命名空间访问
  • CoreDNS支持

2. 环境变量

  • 自动注入服务信息
  • 包含服务地址和端口
  • 简单易用
  • 启动时注入

3. 服务注册与发现

  • 自动服务注册
  • 动态更新端点
  • 健康检查集成
  • 负载均衡支持

服务发现架构

┌─────────────────────────────────────────────────────────┐
│                      Pod (客户端)                         │
│  ┌──────────────────────────────────────────────────┐  │
│  │  应用程序                                          │  │
│  │  - 使用DNS: http://api-service:8080              │  │
│  │  - 使用环境变量: $API_SERVICE_SERVICE_HOST        │  │
│  └──────────────────────────────────────────────────┘  │
└────────────────────┬────────────────────────────────────┘

        ┌────────────┴────────────┐
        │                         │
        ▼                         ▼
┌───────────────┐         ┌──────────────┐
│   CoreDNS     │         │ 环境变量注入  │
│  (DNS解析)    │         │  (启动时)     │
└───────┬───────┘         └──────┬───────┘
        │                        │
        │                        │
        └────────┬───────────────┘


        ┌────────────────┐
        │    Service     │
        │  (ClusterIP)   │
        └────────┬───────┘


        ┌────────────────┐
        │   Endpoints    │
        │  (Pod IP列表)  │
        └────────┬───────┘

        ┌────────┼────────┐
        │        │        │
        ▼        ▼        ▼
    ┌──────┐ ┌──────┐ ┌──────┐
    │ Pod1 │ │ Pod2 │ │ Pod3 │
    └──────┘ └──────┘ └──────┘

DNS服务发现

DNS命名规范

完整域名格式

<service-name>.<namespace>.svc.<cluster-domain>

示例

bash
# 默认集群域名
api-service.default.svc.cluster.local

# 自定义命名空间
user-service.production.svc.cluster.local

# 简写形式(同一命名空间)
api-service

# 跨命名空间访问
user-service.production

DNS记录类型

1. Service DNS记录

bash
# ClusterIP Service
api-service.default.svc.cluster.local. 30 IN A 10.96.0.1

# Headless Service (返回所有Pod IP)
headless-service.default.svc.cluster.local. 30 IN A 10.244.1.2
headless-service.default.svc.cluster.local. 30 IN A 10.244.2.3

# ExternalName Service
external-service.default.svc.cluster.local. 30 IN CNAME external.example.com

2. Pod DNS记录

bash
# Pod DNS记录(需要启用)
10-244-1-2.default.pod.cluster.local. 30 IN A 10.244.1.2

CoreDNS配置

CoreDNS ConfigMap

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
            max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
    example.com:53 {
        errors
        cache 30
        forward . 192.168.1.1
    }

自定义DNS配置

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  example.server: |
    example.com:53 {
        errors
        cache 30
        forward . 8.8.8.8 8.8.4.4
    }
  stub.domain: |
    10.in-addr.arpa:53 {
        errors
        cache 30
        forward . 192.168.1.1
    }

CoreDNS部署配置

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/name: CoreDNS
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      k8s-app: kube-dns
  template:
    metadata:
      labels:
        k8s-app: kube-dns
    spec:
      serviceAccountName: coredns
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      - key: CriticalAddonsOnly
        operator: Exists
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: k8s-app
                  operator: In
                  values:
                  - kube-dns
              topologyKey: kubernetes.io/hostname
      containers:
      - name: coredns
        image: k8s.gcr.io/coredns/coredns:v1.10.1
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        args:
        - -conf
        - /etc/coredns/Corefile
        volumeMounts:
        - name: config-volume
          mountPath: /etc/coredns
          readOnly: true
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - all
          readOnlyRootFilesystem: true
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /ready
            port: 8181
            scheme: HTTP
      dnsPolicy: Default
      volumes:
      - name: config-volume
        configMap:
          name: coredns
          items:
          - key: Corefile
            path: Corefile
---
apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: CoreDNS
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.96.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  - name: metrics
    port: 9153
    protocol: TCP
    targetPort: 9153

环境变量服务发现

自动注入的环境变量

格式规范

bash
<ServiceName>_SERVICE_HOST=<ClusterIP>
<ServiceName>_SERVICE_PORT=<Port>

<ServiceName>_PORT=<Protocol>://<ClusterIP>:<Port>
<ServiceName>_PORT_<PortNumber>_<Protocol>=<Protocol>://<ClusterIP>:<Port>
<ServiceName>_PORT_<PortNumber>_<Protocol>_PROTO=<Protocol>
<ServiceName>_PORT_<PortNumber>_<Protocol>_PORT=<Port>
<ServiceName>_PORT_<PortNumber>_<Protocol>_ADDR=<ClusterIP>

示例

bash
# Service定义
apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  ports:
  - port: 8080
    protocol: TCP
    targetPort: 8080

# 自动注入的环境变量
API_SERVICE_SERVICE_HOST=10.96.0.1
API_SERVICE_SERVICE_PORT=8080
API_SERVICE_PORT=tcp://10.96.0.1:8080
API_SERVICE_PORT_8080_TCP=tcp://10.96.0.1:8080
API_SERVICE_PORT_8080_TCP_PROTO=tcp
API_SERVICE_PORT_8080_TCP_PORT=8080
API_SERVICE_PORT_8080_TCP_ADDR=10.96.0.1

多端口Service环境变量

bash
# Service定义
apiVersion: v1
kind: Service
metadata:
  name: web-service
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  - name: https
    port: 443
    protocol: TCP
    targetPort: 8443

# 自动注入的环境变量
WEB_SERVICE_SERVICE_HOST=10.96.0.2
WEB_SERVICE_SERVICE_PORT=80
WEB_SERVICE_PORT=tcp://10.96.0.2:80
WEB_SERVICE_PORT_80_TCP=tcp://10.96.0.2:80
WEB_SERVICE_PORT_80_TCP_PROTO=tcp
WEB_SERVICE_PORT_80_TCP_PORT=80
WEB_SERVICE_PORT_80_TCP_ADDR=10.96.0.2
WEB_SERVICE_PORT_443_TCP=tcp://10.96.0.2:443
WEB_SERVICE_PORT_443_TCP_PROTO=tcp
WEB_SERVICE_PORT_443_TCP_PORT=443
WEB_SERVICE_PORT_443_TCP_ADDR=10.96.0.2

Pod DNS配置

自定义DNS配置

yaml
apiVersion: v1
kind: Pod
metadata:
  name: custom-dns-pod
spec:
  containers:
  - name: app
    image: nginx
  dnsPolicy: None
  dnsConfig:
    nameservers:
    - 192.168.1.1
    - 8.8.8.8
    searches:
    - default.svc.cluster.local
    - svc.cluster.local
    - cluster.local
    options:
    - name: ndots
      value: "5"
    - name: timeout
      value: "3"
    - name: attempts
      value: "3"

DNS策略选项

yaml
apiVersion: v1
kind: Pod
metadata:
  name: dns-policy-pod
spec:
  containers:
  - name: app
    image: nginx
  dnsPolicy: ClusterFirst

DNS策略类型

  • Default: 继承节点的DNS配置
  • ClusterFirst: 优先使用集群DNS,然后使用节点DNS
  • ClusterFirstWithHostNet: 主机网络模式下的集群DNS
  • None: 自定义DNS配置

主机网络Pod的DNS配置

yaml
apiVersion: v1
kind: Pod
metadata:
  name: host-network-pod
spec:
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet
  containers:
  - name: app
    image: nginx

操作命令

DNS查询和测试

bash
# 查看DNS服务
kubectl get svc -n kube-system kube-dns

# 查看CoreDNS Pod
kubectl get pods -n kube-system -l k8s-app=kube-dns

# 测试DNS解析
kubectl run -it --rm --restart=Never dns-test --image=busybox -- nslookup kubernetes

# 测试服务DNS解析
kubectl run -it --rm --restart=Never dns-test --image=busybox -- nslookup api-service

# 测试跨命名空间DNS解析
kubectl run -it --rm --restart=Never dns-test --image=busybox -- nslookup api-service.production

# 使用dig测试
kubectl run -it --rm --restart=Never dns-test --image=nicolaka/netshoot -- dig kubernetes.default.svc.cluster.local

# 查看DNS配置
kubectl run -it --rm --restart=Never dns-test --image=busybox -- cat /etc/resolv.conf

查看环境变量

bash
# 查看Pod环境变量
kubectl exec <pod-name> -- env | grep SERVICE

# 查看特定环境变量
kubectl exec <pod-name> -- env | grep API_SERVICE

# 进入Pod查看
kubectl exec -it <pod-name> -- sh
env | grep SERVICE

CoreDNS管理

bash
# 查看CoreDNS配置
kubectl get configmap -n kube-system coredns -o yaml

# 编辑CoreDNS配置
kubectl edit configmap -n kube-system coredns

# 查看CoreDNS日志
kubectl logs -n kube-system -l k8s-app=kube-dns

# 重启CoreDNS
kubectl rollout restart deployment -n kube-system coredns

# 查看CoreDNS指标
kubectl port-forward -n kube-system svc/kube-dns 9153:9153
curl http://localhost:9153/metrics

DNS缓存清理

bash
# 清理CoreDNS缓存(重启Pod)
kubectl delete pods -n kube-system -l k8s-app=kube-dns

# 强制DNS重新解析
kubectl exec -it <pod-name> -- nslookup -debug api-service

实践示例

示例1:微服务间服务发现

yaml
# 用户服务
apiVersion: v1
kind: Service
metadata:
  name: user-service
  namespace: production
spec:
  selector:
    app: user-service
  ports:
  - port: 8080
    targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      containers:
      - name: user-service
        image: user-service:v1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: DB_SERVICE
          value: "mysql-service.production.svc.cluster.local"
        - name: CACHE_SERVICE
          value: "redis-service.production.svc.cluster.local"
---
# 订单服务 - 使用DNS发现用户服务
apiVersion: v1
kind: Service
metadata:
  name: order-service
  namespace: production
spec:
  selector:
    app: order-service
  ports:
  - port: 8080
    targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      containers:
      - name: order-service
        image: order-service:v1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: USER_SERVICE_URL
          value: "http://user-service.production.svc.cluster.local:8080"
        - name: PRODUCT_SERVICE_URL
          value: "http://product-service.production.svc.cluster.local:8080"

示例2:使用环境变量的服务发现

yaml
apiVersion: v1
kind: Service
metadata:
  name: api-gateway
  namespace: default
spec:
  selector:
    app: api-gateway
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: auth-service
  namespace: default
spec:
  selector:
    app: auth-service
  ports:
  - port: 8080
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: backend-service
  namespace: default
spec:
  selector:
    app: backend-service
  ports:
  - port: 8080
    targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      containers:
      - name: api-gateway
        image: api-gateway:v1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: AUTH_SERVICE_HOST
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        command:
        - /bin/sh
        - -c
        - |
          echo "Auth Service: $AUTH_SERVICE_SERVICE_HOST:$AUTH_SERVICE_SERVICE_PORT"
          echo "Backend Service: $BACKEND_SERVICE_SERVICE_HOST:$BACKEND_SERVICE_SERVICE_PORT"
          ./start-gateway.sh

示例3:Headless Service和直接Pod访问

yaml
# Headless Service for StatefulSet
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
  namespace: database
spec:
  clusterIP: None
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: database
spec:
  serviceName: mysql-headless
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi
---
# 应用连接到特定的MySQL Pod
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment
  namespace: database
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:v1.0.0
        env:
        - name: DB_HOST
          value: "mysql-0.mysql-headless.database.svc.cluster.local"
        - name: DB_REPLICA_HOST
          value: "mysql-1.mysql-headless.database.svc.cluster.local"

示例4:自定义DNS配置

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-dns-config
  namespace: default
data:
  resolv.conf: |
    nameserver 10.96.0.10
    nameserver 8.8.8.8
    search default.svc.cluster.local svc.cluster.local cluster.local
    options ndots:5 timeout:3 attempts:3
---
apiVersion: v1
kind: Pod
metadata:
  name: custom-dns-app
  namespace: default
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - name: dns-config
      mountPath: /etc/resolv.conf
      subPath: resolv.conf
  volumes:
  - name: dns-config
    configMap:
      name: app-dns-config
---
# 或者使用dnsPolicy和dnsConfig
apiVersion: v1
kind: Pod
metadata:
  name: dns-config-app
  namespace: default
spec:
  containers:
  - name: app
    image: nginx
  dnsPolicy: None
  dnsConfig:
    nameservers:
    - 10.96.0.10
    - 8.8.8.8
    searches:
    - default.svc.cluster.local
    - svc.cluster.local
    - cluster.local
    options:
    - name: ndots
      value: "5"
    - name: timeout
      value: "3"
    - name: attempts
      value: "3"

示例5:跨命名空间服务发现

yaml
# 生产命名空间的共享服务
apiVersion: v1
kind: Service
metadata:
  name: shared-database
  namespace: shared-services
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
    targetPort: 5432
---
# 开发环境访问共享数据库
apiVersion: v1
kind: Pod
metadata:
  name: dev-app
  namespace: development
spec:
  containers:
  - name: app
    image: myapp:v1.0.0
    env:
    - name: DB_HOST
      value: "shared-database.shared-services.svc.cluster.local"
    - name: DB_PORT
      value: "5432"
---
# 测试环境访问共享数据库
apiVersion: v1
kind: Pod
metadata:
  name: test-app
  namespace: testing
spec:
  containers:
  - name: app
    image: myapp:v1.0.0
    env:
    - name: DB_HOST
      value: "shared-database.shared-services"
    - name: DB_PORT
      value: "5432"

故障排查

常见问题

1. DNS解析失败

bash
# 检查CoreDNS状态
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns

# 检查DNS Service
kubectl get svc -n kube-system kube-dns
kubectl describe svc -n kube-system kube-dns

# 测试DNS解析
kubectl run -it --rm --restart=Never dns-test --image=busybox -- nslookup kubernetes

# 检查Pod DNS配置
kubectl exec <pod-name> -- cat /etc/resolv.conf

# 检查CoreDNS配置
kubectl get configmap -n kube-system coredns -o yaml

# 测试外部DNS解析
kubectl run -it --rm --restart=Never dns-test --image=busybox -- nslookup google.com

2. 服务无法发现

bash
# 检查Service是否存在
kubectl get svc -n <namespace>
kubectl describe svc <service-name> -n <namespace>

# 检查Service标签选择器
kubectl get svc <service-name> -o yaml | grep selector -A 5

# 检查Endpoint
kubectl get endpoints <service-name> -n <namespace>
kubectl describe endpoints <service-name> -n <namespace>

# 检查Pod标签
kubectl get pods -n <namespace> --show-labels

# 验证标签匹配
kubectl get pods -n <namespace> -l <label-key>=<label-value>

# 测试服务连接
kubectl run -it --rm --restart=Never test --image=busybox -- wget -O- http://<service-name>:<port>

3. 环境变量未注入

bash
# 检查Pod创建时间(必须在Service之后)
kubectl get pods -o wide

# 检查Service创建时间
kubectl get svc

# 查看Pod环境变量
kubectl exec <pod-name> -- env | grep SERVICE

# 重启Pod以获取环境变量
kubectl delete pod <pod-name>

# 检查Service是否存在
kubectl get svc <service-name>

4. 跨命名空间访问失败

bash
# 检查命名空间
kubectl get namespaces

# 检查Service在正确的命名空间
kubectl get svc -n <namespace>

# 使用完整域名测试
kubectl run -it --rm --restart=Never test --image=busybox -- nslookup <service-name>.<namespace>.svc.cluster.local

# 检查网络策略
kubectl get networkpolicy -n <namespace>

# 测试跨命名空间连接
kubectl run -it --rm --restart=Never test --image=busybox -- wget -O- http://<service-name>.<namespace>:<port>

5. CoreDNS性能问题

bash
# 查看CoreDNS资源使用
kubectl top pods -n kube-system -l k8s-app=kube-dns

# 查看CoreDNS副本数
kubectl get deployment -n kube-system coredns

# 扩容CoreDNS
kubectl scale deployment -n kube-system coredns --replicas=3

# 查看CoreDNS指标
kubectl port-forward -n kube-system svc/kube-dns 9153:9153
curl http://localhost:9153/metrics | grep coredns

# 检查DNS查询缓存
kubectl exec -n kube-system <coredns-pod> -- cat /etc/coredns/Corefile | grep cache

6. DNS超时问题

bash
# 检查DNS超时配置
kubectl exec <pod-name> -- cat /etc/resolv.conf

# 测试DNS解析时间
kubectl run -it --rm --restart=Never test --image=nicolaka/netshoot -- time nslookup kubernetes

# 调整DNS超时配置
# 在Pod中添加dnsConfig
# options:
# - name: timeout
#   value: "5"
# - name: attempts
#   value: "3"

# 检查CoreDNS日志
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100

最佳实践

1. DNS使用建议

  • 使用简短的服务名(同一命名空间)
  • 跨命名空间使用完整域名
  • 避免硬编码IP地址
  • 使用有意义的Service名称

2. 环境变量使用建议

  • Service创建必须在Pod之前
  • 使用ConfigMap管理配置
  • 避免依赖环境变量顺序
  • 文档化环境变量用途

3. CoreDNS配置建议

  • 部署多个副本实现高可用
  • 配置合理的缓存时间
  • 监控DNS查询性能
  • 定期备份CoreDNS配置

4. 性能优化

  • 启用DNS缓存
  • 调整ndots值
  • 使用本地DNS缓存
  • 监控DNS查询延迟

5. 安全建议

  • 限制DNS查询范围
  • 使用网络策略保护DNS
  • 监控异常DNS查询
  • 定期审计DNS配置

6. 监控和告警

  • 监控CoreDNS健康状态
  • 监控DNS查询延迟
  • 监控DNS错误率
  • 设置合理的告警阈值

7. 故障恢复

  • 配置CoreDNS自动扩缩容
  • 准备DNS故障应急预案
  • 定期演练DNS故障恢复
  • 记录常见问题和解决方案

总结

服务发现是Kubernetes微服务架构的基础设施,DNS服务发现和环境变量提供了两种互补的服务发现机制。理解服务发现的工作原理和最佳实践,对于构建可靠的微服务系统至关重要。

下一步学习