If you’ve been following our recent Kubernetes migration blog, you already know the journey has been full of challenges. From configuring pods to tackling networking issues, it’s been a rollercoaster. We’ve explored several tricky problems in previous blogs, and today, we invite you to put on your detective hat and join us as we investigate another Kubernetes mystery.

The Mysterious Case of NXDomain Errors

Imagine this: You’re checking your Kubernetes observability tools, and suddenly, you notice something strange over a million NXDomain errors! What could be causing this? Let’s break it down together.

What Are NXDomain Errors?

Before we jump in, let’s test your DNS knowledge:

Pop Quiz: What does an NXDomain error indicate?

A) A domain exists but is unreachable.
B) A domain doesn’t exist.
C) A domain is experiencing high latency.

(Take a moment to think! Scroll down for the answer…)

The Answer: If you guessed B) A domain doesn’t exist, you’re right! These errors occur when a DNS query is made for a non-existent domain.

Unraveling the Clues

We took a closer look at the logs and found something unusual—external domains were mysteriously gaining extra words like .cluster.local or .internal.cloudapp.net. Here are two examples:

  • gmail.googleapis.com.cluster.local

  • oauth2.googleapis.com.es52e2p4cafzg4m1it5a.bx.internal.cloudapp.net

Now, let’s put your troubleshooting skills to the test:

What do you think is happening here?
A) These domains are being redirected intentionally.
B) Kubernetes is modifying external domains.
C) A rogue service is interfering with DNS.

(Think about it before scrolling!)

The Answer: B) Kubernetes is modifying external domains. But why? Let’s find out.

How Kubernetes Handles DNS Queries

To solve this puzzle, we need to understand how Kubernetes resolves DNS queries. When a pod performs a DNS lookup, Kubernetes doesn’t always send the request as-is. Instead, it applies search domains and NDots rules to the query.

Here’s a fun experiment: Try running the following command inside a Kubernetes pod:

cat /etc/resolv.conf

What do you see? You should find an entry for search domains and an ndots value. These settings influence how Kubernetes resolves domain names.

Connecting the Dots

Because the ndots value was set to 5, Kubernetes treated gmail.googleapis.com as an incomplete domain and appended search domains, turning it into:

  • gmail.googleapis.com.svc.cluster.local

  • gmail.googleapis.com.cluster.local

These domains don’t exist, leading to the dreaded NXDomain errors!

Fixing the Problem

Now that we’ve cracked the case, let’s apply the fix. Here’s how you can customize DNS settings to prevent Kubernetes from modifying external domains:

apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 1.2.3.4
    searches:
      - ns1.svc.cluster-domain.example
      - my.dns.search.suffix
    options:
      - name: ndots
        value: "2"
      - name: edns0

The Outcome: A Smooth DNS Experience

By adjusting the DNS configuration, we prevent Kubernetes from mistakenly modifying external queries. This eliminates NXDomain errors and ensures external services resolve correctly.