Kubernetes 出口网络策略管理操作指南

用 CoreDNS hosts 插件 + Calico NetworkPolicy 实现 K8s 出口域名白名单。不改 CNI,不上 Service Mesh,纯开源方案支持 GitOps 多集群管理。

1. 概述

1.1 方案简介

在 Kubernetes 生产环境中,Pod 默认可以访问任意外部网络。本文档介绍一种通过 CoreDNS hosts 插件配合 Calico NetworkPolicy,实现出口域名访问控制的方法。

核心思路是:把需要放行的域名解析到固定 IP(比如 10.0.0.1),然后用 Calico 策略只允许访问这个 IP 段。这样不需要改 CNI,不需要引入 Service Mesh,也不用付费。

每个 namespace 可以配置独立的白名单,支持 GitOps 多集群管理,有 CI 检测冲突,有监控告警,也有完整的回滚方案。

1.2 为什么不选其他方案

备选方案不选原因
CiliumNetworkPolicy + FQDN需要把 Calico CNI 换成 Cilium,改动太大,多集群都要动
Istio egress Gateway需要完整的 Istio 环境,Sidecar 注入对应用有侵入,资源消耗高
Calico Enterprise商业付费产品
纯 iptables/ipvs 规则规则难管理,和 Kubernetes 命名空间概念不对齐,多集群同步麻烦

1.3 为什么选 Calico + CoreDNS 劫持

继续用现有 Calico CNI,只加策略配置。不需要额外部署组件,NetworkPolicy 是 Kubernetes 原生资源,YAML 可以用 Kustomize 管理,ArgoCD 直接同步。没有许可证费用。

1.4 技术选型对比

方案CNI 依赖侵入性复杂度推荐场景
CiliumNetworkPolicy + FQDN需 Cilium CNI已有 Cilium
Calico + CoreDNS 劫持Calico 即可生产推荐
Istio egress GatewayIstio已用 Istio
Calico EnterpriseCalico(商业版)有预算

本方案:Calico + CoreDNS 劫持


2. 架构设计

2.1 整体架构

                    ┌─────────────────────────────────────┐
                    │         Kubernetes Cluster           │
                    │                                      │
  ┌──────────────┐  │  ┌────────────┐  ┌────────────┐     │
  │ namespace-a  │  │  │ namespace-b│  │ namespace-c│     │
  │   (GitHub)   │  │  │  (国内服务) │  │   (混合)   │     │
  └──────┬───────┘  │  └─────┬──────┘  └─────┬─────┘     │
         │          │        │               │            │
         ▼          │        ▼               ▼            │
  ┌──────────────────────────────────────────────┐       │
  │     Calico GlobalNetworkPolicy                │       │
  │     Default Deny Egress + IP Whitelist        │       │
  └──────────────────────┬───────────────────────┘       │
                         │                                │
                         ▼                                │
  ┌──────────────────────────────────────────────────┐   │
  │              CoreDNS (hosts 插件)                    │   │
  │                                                    │   │
  │  已授权域名 → 劫持到固定 IP (10.0.0.x)            │   │
  │  未授权域名 → 转发上游(被策略拦截)               │   │
  └──────────────────────────────────────────────────┘   │
                         │                                │
                         ▼                                │
                  ┌─────────────┐                        │
                  │  外部网络   │                        │
                  └─────────────┘                        │

2.2 劫持原理

  1. Pod 请求 github.com
  2. CoreDNS hosts 插件把域名解析到 10.0.0.1
  3. Calico 检查目标 IP 是否在白名单
  4. 10.0.0.1 允许访问,流量通过 NAT 出去

2.3 关键约束

  1. 劫持 IP 段不能和集群 Pod IP、Service IP 重叠
  2. 应用必须用集群 DNS,不能硬编码 DNS 服务器
  3. Calico order 值越小优先级越高

3. 前置条件

组件版本要求检查命令
Kubernetesv1.32.9kubectl version --short
Calico CNIv3.25+calicoctl version
CoreDNS集群内置kubectl get po -n kube-system -l k8s-app=kube-dns
ArgoCDv2.5+argocd version(可选)

3.1 验证 Calico 状态

# 确认 Calico CNI 正常运行
calicoctl node status

# 确认 GlobalNetworkPolicy 可用
calicoctl get globalsetworkpolicy

3.2 确认 CoreDNS 可配置

# 查看当前 CoreDNS ConfigMap
kubectl get cm -n kube-system coredns -o yaml

# CoreDNS 官方版本都支持 hosts 插件

4. 配置详解

4.1 分配劫持 IP 段

选一个集群里没在用的 IP 段作为劫持 IP。本指南用 10.0.0.0/24,按实际环境调整。

# 确认该 IP 段未被使用
kubectl get pods -o wide | grep -v host-network

4.2 CoreDNS 白名单配置

# base/coredns-egress-whitelist.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: egress-whitelist
  namespace: kube-system
data:
  hosts: |
    # GitHub 相关(namespace-a 用)
    10.0.0.1 github.com
    10.0.0.1 api.github.com
    10.0.0.1 githubusercontent.com
    10.0.0.1 raw.githubusercontent.com

    # 国内服务(namespace-b 用)
    10.0.0.2 baidu.com
    10.0.0.2 qingcdn.com
    10.0.0.2 api.qingcdn.com
    10.0.0.3 aliyun.com
    10.0.0.3 market.aliyun.com

    fallthrough

4.3 修改 CoreDNS Corefile

把上面的 ConfigMap 挂载到 CoreDNS:

# base/coredns-deployment-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: coredns
          args:
            - -conf
            - /etc/coredns/Corefile
            - /etc/coredns/hosts
          volumeMounts:
            - name: config
              mountPath: /etc/coredns
              readOnly: true
            - name: egress-whitelist
              mountPath: /etc/coredns/hosts
              readOnly: true
      volumes:
        - name: egress-whitelist
          configMap:
            name: egress-whitelist
            items:
              - key: hosts
                path: hosts
        - name: config
          configMap:
            name: coredns

4.4 Default Deny 策略

# base/default-deny-egress.yaml
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: default-deny-egress
spec:
  namespaceSelector: ""
  order: 1000
  types:
    - Egress
  egress:
    # 放行 DNS
    - action: Allow
      protocol: UDP
      destination:
        selector: k8s-app == "kube-dns"
        ports:
          - 53
    # 放行 Kubernetes API
    - action: Allow
      protocol: TCP
      destination:
        selector: k8s-app == "kube-apiserver"
        ports:
          - 443
    # 放行劫持 IP 段
    - action: Allow
      destination:
        nets:
          - 10.0.0.0/24
    # 拒绝其他出口
    - action: Deny

4.5 Per-Namespace 出口策略

namespace-a:仅 GitHub

# overlays/cluster-a/namespace-a-policy.yaml
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: namespace-a-allow-github
  namespace: namespace-a
spec:
  order: 100
  namespaceSelector: metadata.name == "namespace-a"
  types:
    - Egress
  egress:
    - action: Allow
      destination:
        nets:
          - 10.0.0.1/32

namespace-b:仅国内服务

# overlays/cluster-b/namespace-b-policy.yaml
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: namespace-b-allow-domestic
  namespace: namespace-b
spec:
  order: 100
  namespaceSelector: metadata.name == "namespace-b"
  types:
    - Egress
  egress:
    - action: Allow
      destination:
        nets:
          - 10.0.0.2/32
          - 10.0.0.3/32
    - action: Deny
      destination:
        nets:
          - 10.0.0.1/32

4.6 验证策略冲突

# 检查是否有冲突的策略
calicoctl get policy -o yaml | grep -E "order:|nets:"

# 确认 Default Deny 存在
calicoctl get globalsetworkpolicy default-deny-egress

5. GitOps 目录结构

k8s-egress/
├── base/
│   ├── kustomization.yaml
│   ├── namespace.yaml
│   ├── default-deny-egress.yaml
│   ├── coredns-egress-whitelist.yaml
│   └── coredns-deployment-patch.yaml
├── overlays/
│   ├── cluster-a/
│   │   ├── kustomization.yaml
│   │   ├── namespace-a-policy.yaml
│   │   └── namespace-a.yaml
│   └── cluster-b/
│       ├── kustomization.yaml
│       ├── namespace-b-policy.yaml
│       └── namespace-b.yaml
├── ci/
│   ├── duplicate-domain-check.yaml
│   └── dns-hardcode-check.sh
├── scripts/
│   ├── validate-policy.sh
│   └── rollback.sh
├── argocd/
│   ├── k8s-egress-cluster-a.yaml
│   └── k8s-egress-cluster-b.yaml
└── monitoring/
    ├── hubble-alerts.yaml
    └── prometheus-rules.yaml

5.1 Kustomization 配置

# overlays/cluster-a/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base/namespace.yaml
  - ../../base/default-deny-egress.yaml
  - ../../base/coredns-egress-whitelist.yaml
  - ../../base/coredns-deployment-patch.yaml
  - namespace-a-policy.yaml
  - namespace-a.yaml

6. CI 冲突检测

6.1 重复域名检测

# ci/duplicate-domain-check.yaml
name: Check Duplicate Egress Domains
on:
  pull_request:
    paths:
      - 'k8s-egress/base/coredns-egress-whitelist.yaml'
      - 'k8s-egress/overlays/**/coredns-*.yaml'

jobs:
  check-duplicates:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Extract all domains
        run: |
          grep -hE '^\s+[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\s+' \
            k8s-egress/base/coredns-egress-whitelist.yaml \
            k8s-egress/overlays/**/coredns-*.yaml \
            2>/dev/null \
            | awk '{print $2}' | sort > /tmp/all-domains.txt

          echo "发现域名 $(wc -l < /tmp/all-domains.txt) 个"
          cat /tmp/all-domains.txt

      - name: Check duplicates
        run: |
          duplicates=$(sort /tmp/all-domains.txt | uniq -d)
          if [ -n "$duplicates" ]; then
            echo "发现重复域名:"
            echo "$duplicates"
            exit 1
          fi
          echo "无重复域名"

6.2 DNS 硬编码检测

#!/bin/bash
# ci/dns-hardcode-check.sh

set -e

echo "检测代码中的硬编码 DNS..."

PATTERNS=(
  "8.8.8.8"
  "114.114.114.114"
  "1.1.1.1"
  "dns.google"
  "223.5.5.5"
)

FOUND=0

for pattern in "${PATTERNS[@]}"; do
  if grep -r "$pattern" \
    --include="*.yaml" \
    --include="*.yml" \
    --include="*.json" \
    --include="*.toml" \
    . 2>/dev/null | grep -v "^./ci/"; then
    echo "发现硬编码 DNS: $pattern"
    FOUND=1
  fi
done

if [ $FOUND -eq 1 ]; then
  echo "请移除硬编码 DNS,使用 CoreDNS 劫持配置"
  exit 1
fi

echo "无硬编码 DNS"

7. 监控告警

7.1 Hubble 流量监控

# monitoring/hubble-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hubble-relay
  namespace: calico-system
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: hubble-relay
  template:
    metadata:
      labels:
        k8s-app: hubble-relay
    spec:
      containers:
        - name: hubble-relay
          image: quay.io/cilium/hubble-relay:latest
          command:
            - hubble
            - relay
          ports:
            - containerPort: 4245

7.2 Prometheus 告警规则

# monitoring/prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: egress-policy-alerts
  namespace: kube-system
spec:
  groups:
    - name: egress-policy
      rules:
        - alert: EgressDenyRateHigh
          expr: |
            rate(calico_egress_policy_deny_total[5m]) > 10
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "出口拒绝流量异常增多"
            description: |
              集群 {{ $labels.cluster }} 过去 5 分钟
              拒绝出口流量速率 > 10/s
              当前值: {{ $value }}/s

        - alert: DNShijackUnmatched
          expr: |
            rate(coredns_dns_responses_total{rcode="NXDOMAIN"}[5m]) > 5
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "DNS 解析 NXDOMAIN 增多"
            description: "可能存在未配置白名单的域名被劫持"

        - alert: CoreDNSEgressHostsMissing
          expr: |
            count(coredns_dns_responses_total{plugin=="hosts"}) == 0
          for: 10m
          labels:
            severity: critical
          annotations:
            summary: "CoreDNS egress hosts 配置缺失"
            description: "CoreDNS 未加载 egress-whitelist hosts 配置"

7.3 Grafana Dashboard

{
  "dashboard": {
    "title": "Egress Policy Monitoring",
    "panels": [
      {
        "title": "出口流量 Allow vs Deny",
        "type": "piechart",
        "targets": [
          {
            "expr": "sum(rate(calico_egress_policy_allow_total[5m]))",
            "legendFormat": "Allow"
          },
          {
            "expr": "sum(rate(calico_egress_policy_deny_total[5m]))",
            "legendFormat": "Deny"
          }
        ]
      },
      {
        "title": "各 Namespace 出口流量 Top 10",
        "type": "bargauge",
        "targets": [
          {
            "expr": "topk(10, sum by (namespace) (rate(calico_egress_policy_allow_total[5m])))",
            "legendFormat": "{{namespace}}"
          }
        ]
      },
      {
        "title": "DNS 劫持命中率",
        "type": "timeseries",
        "targets": [
          {
            "expr": "rate(coredns_dns_responses_total{plugin==\"hosts\"}[5m])",
            "legendFormat": "命中 hosts"
          },
          {
            "expr": "rate(coredns_dns_responses_total{plugin==\"forward\"}[5m])",
            "legendFormat": "转发上游"
          }
        ]
      }
    ]
  }
}

8. 实施步骤

8.1 分阶段部署流程

阶段步骤操作验证
Phase 11部署 CoreDNS ConfigMapkubectl get cm -n kube-system egress-whitelist
2修改 CoreDNS 挂载 hosts 文件kubectl rollout restart -n kube-system deployment/coredns
3验证 CoreDNS hosts 生效kubectl run -it --rm dns-test --image=busybox --restart=Never -- nslookup github.com
Phase 24部署 Default Deny 策略calicoctl get GlobalNetworkPolicy default-deny-egress
5确认集群内 DNS/API 正常kubectl get pods -A
Phase 36选择性测试(仅 namespace-a)部署 namespace-a-policy.yaml
7验证 GitHub 访问正常kubectl exec -n namespace-a test-pod -- curl -I github.com
8验证未授权域名拒绝kubectl exec -n namespace-a test-pod -- curl -I blocked-domain.com
Phase 49逐步覆盖所有 namespace逐个部署 per-namespace 策略
10配置 ArgoCD ApplicationGitOps 自动化同步
Phase 511部署 Prometheus 告警规则Grafana 验证告警
12验证 Hubble 流量监控hubble ui 查看出口流量

8.2 快速验证脚本

#!/bin/bash
# scripts/validate-policy.sh

NAMESPACE="${1:-namespace-a}"
TEST_DOMAIN="${2:-github.com}"

echo "=== 验证出口策略 ($NAMESPACE) ==="

echo "[1/3] 检查 CoreDNS hosts 配置..."
kubectl get cm -n kube-system egress-whitelist -o jsonpath='{.data.hosts}' | grep -q "$TEST_DOMAIN" && echo "hosts 配置存在" || echo "hosts 配置缺失"

echo "[2/3] 检查 Calico NetworkPolicy..."
calicoctl get policy -n "$NAMESPACE" 2>/dev/null | grep -q "allow" && echo "NetworkPolicy 存在" || echo "NetworkPolicy 缺失"

echo "[3/3] 测试 DNS 劫持..."
POD_IP=$(kubectl get pod -n "$NAMESPACE" -l app=nginx -o jsonpath='{.items[0].status.podIP}' 2>/dev/null || echo "")
if [ -n "$POD_IP" ]; then
  kubectl exec -n "$NAMESPACE" nginx-0 -- nslookup "$TEST_DOMAIN" 2>/dev/null | grep -q "10.0.0" && echo "DNS 劫持生效" || echo "DNS 未劫持"
else
  echo "未找到测试 Pod,跳过 DNS 验证"
fi

echo "=== 验证完成 ==="

8.3 回滚方案

#!/bin/bash
# scripts/rollback.sh

echo "开始回滚出口策略..."

echo "[1/3] 删除 Calico NetworkPolicy..."
calicoctl delete policy --all 2>/dev/null || true
calicoctl delete globalsetworkpolicy default-deny-egress 2>/dev/null || true

echo "[2/3] 删除 CoreDNS ConfigMap..."
kubectl delete cm -n kube-system egress-whitelist 2>/dev/null || true

echo "[3/3] 重启 CoreDNS..."
kubectl rollout restart -n kube-system deployment/coredns

echo "回滚完成"

8.4 ArgoCD 多集群部署

# argocd/k8s-egress-cluster-a.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: k8s-egress-cluster-a
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/your-org/k8s-egress.git
    targetRevision: main
    path: overlays/cluster-a
  destination:
    server: https://cluster-a.k8s.internal:6443
    namespace: kube-system
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - serverSideApply=true
# 部署到所有集群
argocd app set k8s-egress-cluster-a --sync-policy automated
argocd app set k8s-egress-cluster-b --sync-policy automated

# 查看同步状态
argocd app list -l app.kubernetes.io/managed-by=argocd

9. 常见问题

Q1: CoreDNS 重启影响业务?

A: 滚动更新时会有短暂 DNS 抖动(通常 < 30 秒)。建议在业务低峰期操作,或用 kubectl rollout pause 暂停滚动更新。

Q2: 如何处理通配符域名(如 *.aliyun.com)?

A: CoreDNS hosts 插件只支持精确域名匹配。通配符场景建议用应用层代理(如 Envoy),或者拆成多个精确域名逐个配置。

Q3: 多集群域名白名单不一致?

A: 共享域名放 base/coredns-egress-whitelist.yaml,集群特有的放 overlays/cluster-X/

Q4: 临时授权怎么处理?

A: 短期(< 24h)通过 ArgoCD Rollback 快速撤销;长期通过 PR 流程正式合并,CI 会检测冲突。

Q5: 日志审计怎么做?

A: 通过 Hubble 导出 Flow 日志到 Elasticsearch:

# Hubble Flow 日志导出
apiVersion: v1
kind: ConfigMap
metadata:
  name: hubble-relay
  namespace: calico-system
data:
  config.yaml: |
    flow:
      enableCapture: true
      exporters:
        - type: Elasticsearch
          address: elasticsearch.logging:9200

10. 参考资料

资源链接
Calico NetworkPolicy 文档https://docs.tigera.io/calico/latest/network-policy/policy-rules/dns-policy
CoreDNS hosts 插件https://coredns.io/plugins/hosts/
Cilium FQDN Policy(备选)https://docs.cilium.io/en/stable/policy/language/#dns-based
ArgoCD 多集群管理https://argo-cd.readthedocs.io/en/stable/operator-manual/cluster-bootstrapping/