System designJune 13, 202614 min read

I Built a DNS Server to Understand the Thing I'd Blindly Trusted for Ten Years

I Built a DNS Server to Understand the Thing I'd Blindly Trusted for Ten Years

Before DNS: the whole internet ran on a single text file

To understand why DNS exists, you have to go back to when it didn't and honestly, this story hooked me.

In the early 1980s, ARPANET (the internet's ancestor) had no DNS. So how did computers find each other? With a text file. That's right a single file called HOSTS.TXT, listing which machine name mapped to which IP, kept by Stanford Research Institute (SRI-NIC) in California.

The way it worked is almost funny in hindsight: if you got a new machine and wanted online, you'd... call or email SRI, SRI would update the file by hand, and once a day every machine on the entire network would re-download that file. The whole "internet directory" was a single document, edited manually and shipped out once daily.

You see the problem instantly, right? With a few hundred machines, fine. But the network was growing exponentially. The file ballooned, the traffic just to download it ballooned, and most importantly: one single entity had to manually review every change in the entire world. Two machines on opposite ends of the earth couldn't share a name, because the directory was shared. It was a bottleneck, and it was tightening.

In 1983, a man named Paul Mockapetris, working at USC under the legendary Jon Postel, was handed exactly one job: come up with something to replace it. He wrote two documents, RFC 882 and RFC 883, and the Domain Name System was born.

The genius wasn't "translate name to number"HOSTS.TXT already did that. The genius was two ideas: hierarchy and delegation. Instead of one giant directory held by one person, make it a tree: the root only needs to know who runs .com and .org; whoever runs .com only needs to know who runs example.com; and the details inside example.com? Let its actual owner handle that. Each branch manages its own piece, and nobody has to approve anyone else's.

That inversion from centralized to distributed is exactly why DNS has survived four decades and now carries trillions of queries a day, while HOSTS.TXT died long ago. And building mini-dns, I realized I was reconstructing that very 1983 philosophy: a server that knows its own piece, and knows how to point the way for the pieces that aren't its own.

How DNS actually works: how many doors does one question pass through?

That hierarchical philosophy, translated into a running mechanism, looks like this. When you type www.example.com, a chain of dominoes falls behind your back in a few dozen milliseconds:

  1. The browser asks your machine. Your machine (the stub resolver) asks a recursive resolver — usually your ISP's, or 8.8.8.8, 1.1.1.1.

  2. If the recursive doesn't know, it doesn't ask any one specific server it climbs the tree from the root. First it asks a root server (.): "who runs the .com suffix?"

  3. The root replies: "I don't know the IP, but go ask these TLD servers that run .com." → this is a referral, not an answer, but a direction.

  4. The recursive then asks the .com TLD: "who runs example.com?" → the TLD again points it toward the domain's authoritative server.

  5. Finally the recursive asks the authoritative the one that actually owns the answer: "what's the IP for www.example.com?" → it replies: 192.0.2.1.

Five steps, for one visit. The reason you don't feel that latency on every page load is cache at every layer — the soul of the whole system, which I'll get to below.

What made me go "oh": no single server knows everything. This is precisely the 1983 legacy each layer only knows "who's the next one to ask." A design that refuses centralization right down in its genes.

mini-dns plays two roles in this drama: it's both authoritative (owning a self-defined zone) and capable of recursive forwarding (push unknown names upstream, then remember them).

What does a DNS reply look like inside the packet?

To make a server reply, you have to build every byte yourself. A DNS message has four sections:

  • Header: holds the transaction ID (to match a question to its answer) and a set of crucial flags.

  • Question: what you're asking domain name, record type, class.

  • Answer / Authority / Additional: three sections holding the answer and supporting info.

The flags taught me the most:

  • AA (Authoritative Answer): when set, it means "this answer comes from the real owner, not from cache." When mini-dns answers a name in its own zone, I set this flag; when it returns forwarded data, I don't. Distinguishing "I know for sure" from "I heard it secondhand" DNS has that built in.

  • RCODE: the result code. NOERROR is fine. NXDOMAIN means "this name does not exist in the world." And here's the subtle trap NXDOMAIN is not NODATA.

I got that trap wrong the first time:

  • NXDOMAIN = the name doesn't exist at all. Like nothinghere.example.com.

  • NODATA = the name does exist, but not in the record type you asked for. For example example.com has an A record (IPv4) but you ask it for AAAA (IPv6) the name exists, that type is empty. Here you must return NOERROR with an empty answer section, not NXDOMAIN.

Sounds minor, but mixing these two up is the real cause of bugs like "email suddenly can't be delivered" or "the client retries forever." DNS doesn't just answer "yes or no" it answers "no, but in which way." And that "which way" matters enormously.

The zone file: the naked heart of an authoritative server

The thing that stopped me short: a domain's source data is just a text file. Yes forty years after HOSTS.TXT, the heart is still text. The difference is that now everyone only holds their own piece.

example.com.     3600 SOA   ns1.example.com. admin.example.com. 1 7200 3600 1209600 3600
example.com.     3600 NS    ns1.example.com.
example.com.     3600 A     192.0.2.1
example.com.     3600 AAAA  2001:db8::1
www.example.com. 3600 CNAME example.com.
example.com.     3600 MX    10 mail.example.com.
example.com.     3600 TXT   "v=spf1 -all"
*.example.com.   3600 A     192.0.2.9

Each record type is a puzzle piece, and handling each one myself is when I truly understood them:

  • SOA (Start of Authority) the zone's "birth certificate" line. Those numbers afterward aren't decoration: serial (the zone version, so secondary servers know there's a change to sync), then refresh, retry, expire, and most importantly the last number is the negative-cache TTL, i.e. "how long an NXDOMAIN error is remembered." A tiny number that decides the whole system's load tolerance.

  • NS points to who the authoritative nameservers are. This is the thread tying your zone into the global hierarchy the very tree Mockapetris drew in 1983.

  • A / AAAA point a name to an IPv4 / IPv6 address.

  • CNAME an alias. Strict rule: a name that has a CNAME cannot have any other record at the same level. When resolving, if www is a CNAME pointing to example.com, a decent server follows it through (CNAME chaining), packing the destination into the same reply so the client doesn't have to ask another round.

  • MX where email goes, with a priority number (10). Get this line wrong and the whole company's mail falls into the void.

  • TXT the catch-all bin: SPF for anti-spoofing email, domain ownership verification, DKIM...

  • * wildcard catches every subdomain not declared. Handy but double-edged: one typo and a thousand ghost subdomains point to the wrong place.

Writing the parser, I faced a small decision that says a lot about the craft: a line typed wrong missing trailing dot, bad syntax what do you do? Crash the whole file? Or skip that line?

I chose: log the exact bad line number, skip it, let the rest live. That's the line between "one missing dot" and "the whole company loses its website at midnight." Most production incidents don't come from big errors. They come from a tiny thing handled without care.

(Oh notice the trailing dot in example.com.? That's the FQDN saying "this is an absolute name, already at the root, don't append anything." Without it, many systems append the domain, making example.com.example.com. One dot. A whole evening of debugging.)

TTL and cache the soul, and also the pain, of DNS

The 3600 on each line is the TTL (Time To Live), in seconds: "this answer may be remembered for one hour."

This is the culprit behind the most painful incantation in the industry: "I changed the DNS, why hasn't it taken effect?"

I used to think propagation was the internet being "lazy." Wrong. The internet isn't slow it's honoring your own promise. When you set TTL 3600, you told the whole world "remembering this for an hour is fine." And every resolver along the way remembered. Now you change your mind ten minutes later and... they're still keeping your old promise, until the countdown hits zero.

And this I only understood after coding the cache myself: TTL isn't one clock, it counts down independently at every layer. Browser cache, OS cache, ISP resolver cache each holds a copy with its own clock. That's why you see the site up at home while your friend on a different network still reports it down. Nobody's wrong their clock just hasn't hit zero.

The takeaway I drew, applicable well beyond engineering: cache is a trade-off between speed and truth. Remember longer, go faster, but be more likely to be wrong when the truth changes. Every system and honestly, every human lives inside that trade-off.

A battle-scarred practical tip: before you plan to change an important record, drop its TTL way down a few days ahead. So when you change for real, the world forgets the old value fast. Nobody teaches this; only those who've sat waiting for propagation at 2 a.m. carry it in their bones.

The deepest lesson: you must cache the things that don't exist

This is the "aha" moment I still think about.

Caching correct answers is obvious: ask google.com once, remember it, answer instantly next time. But there's something far subtler: negative caching remembering even the result "this name does not exist" (NXDOMAIN).

At first it sounded absurd. Why remember a thing that doesn't exist?

Then I understood. Out there are countless bots probing blindly: aaa.example.com, admin123.example.com, xyz.example.com... If every time it hits garbage the server dutifully climbs the tree to ask and gets back "nope," it turns itself into the victim. Remembering "this guy doesn't exist" — and for how long is exactly what that SOA line above dictates is the shield. (See? Everything connects.)

This is strangely beautiful and reaches beyond engineering: knowing a thing doesn't exist is knowledge too, and sometimes the most valuable kind. Knowing which road leads nowhere. Knowing which approach doesn't work. A wise system doesn't just remember the right answers it remembers the dead ends, so it doesn't walk them twice. Mature humans are the same.

I also added single-flight: if a hundred identical questions hit at the same moment while the cache is empty, only one gets to go ask; the other ninety-nine wait and share the result. Without it, a sudden burst of traffic could turn your server into a pump, unwittingly attacking the very upstream it's relying on. Sometimes the best thing a system can do under pressure is exercise restraint don't ask the same question ninety-nine times.

UDP, 512 bytes, and why DNS has a strange "joint" with TCP

Everyone memorizes "DNS runs over UDP port 53." True, but missing half the story.

UDP is fast, light, no fussy handshake perfect for lightning-quick Q&A. But traditional UDP-DNS is capped at 512 bytes (a number chosen back in 1983, when networks were fragile). What about answers longer than that (many records, or DNSSEC creaking under its signatures)?

Then the server sets the TC (Truncated) flag saying "this answer's too long, it got cut off, go ask again over TCP." The client hears that and opens a full TCP connection to ask again from scratch. I had to code this exact two-step rhythm, and that's when "UDP with TCP fallback" went from abstract phrase to very concrete lines of code.

Then EDNS(0) showed up as a polite patch for that ancient 512-byte limit: the client attaches a pseudo-record announcing "I can handle up to 4096 bytes," so the server sends the big packet straight over UDP, skipping the TCP detour. In the same family is name compression within one packet, a repeated domain name isn't written out in full but points back to where it was already written. Decades ago people had to squeeze every byte like this, and that legacy still runs inside every packet today.

The privacy shock: traditional DNS is naked

This part genuinely made me stop.

Classic DNS is sent in plaintext. Unencrypted. This too is a legacy of the naive trust of 1983 when the internet was just a few research institutes who knew each other, nobody thought about eavesdropping. Every time you visit a site, the question "what's the IP for this name?" is shouted down the wire for anyone to hear your ISP, the guy running the café Wi-Fi, anyone sitting in the right spot.

You browse over HTTPS, pretty green padlock, content sealed tight. But the list of places you visit is left wide open right from the DNS lookup. Like sealing the contents of a letter but writing the recipient's address big on the envelope for the whole post office to read.

Coding in DoT (DNS-over-TLS, wrapped in TLS on its own port) and DoH (DNS-over-HTTPS, hiding the DNS query inside ordinary HTTPS traffic, nearly indistinguishable from web browsing) is when I understood why browsers have quietly switched on DoH by default in recent years. Not for show it's patching a gap that lay dormant in the internet's foundation for forty years.

The broader lesson: what leaks information about you is usually not the content it's the metadata. Not what you say, but who you say it to, when, and how often.

What separates a toy from the real thing

Getting a DNS to answer correctly is only half. The hard half is making it not collapse when thrown into the real world, where nobody is kind to it.

Rate limiting per client, because DNS is a favorite snack of amplification attacks the attacker sends a small packet with the victim's spoofed IP, forcing the server to spray a large packet at that victim; an unlimited server unwittingly becomes the cannon. Hot reload: edit the zone, send a SIGHUP signal, and the server reloads immediately no restart, no dropping in-flight requests because in production, "turn it off and on again" is a luxury. And one socket per CPU core (SO_REUSEPORT) so the kernel load-balances, instead of every packet squeezing through a single door.

None of these make the DNS "smarter." They just make it survive. The longer I work, the more I believe: the gap between something that runs on your machine and something that holds up in the real world lives almost entirely in the part nobody shows in the demo.

So that night the site went down, what did I learn?

Honestly: mini-dns wasn't born to compete with BIND or the industrial DNS servers that have been battle-tested for decades. It's an exercise in understanding. And it paid off in a way I didn't expect.

Now when a site "won't load," I don't guess in a panic. I ask in order, like a doctor examining a patient rather than a fortune teller: what does dig return? Is the old cache still alive because the TTL hasn't expired? Is the A record the right IP? Is the CNAME looping around? Is it returning NXDOMAIN or NODATA? Is a wrong MX dropping mail into the void? Have the NS records propagated and agreed with each other?

The black box became a system I understand. And that applies to nearly everything in this craft: you don't truly own a technology until you dare to open it up and look inside. Before that, you're just borrowing someone else's trust and praying it doesn't break on your shift.

There's a beautiful thing I keep thinking about: the system Paul Mockapetris sketched in 1983, on a TOPS-20 machine, to solve the pain of a few hundred computers that design is still carrying trillions of queries a day for the whole planet, with its core barely changed. People patched on security, encryption, IPv6, but the hierarchy-and-delegation heart is intact. A good idea, simple enough and right enough, can outlive empires.

If you're also the "learn by tearing it apart and rebuilding it" type, try cloning mini-dns, build it:

cargo build --release
./target/release/mini-dns

then throw it the first question:

dig @127.0.0.1 -p 8888 example.com A

The moment it returns exactly the IP you typed into the zone seconds earlier the internet's vast, distributed phone book suddenly gets a little less mysterious. It's no longer magic. It's something you understand.

And for an engineer, turning magic into something you understand that's perhaps the quietest but most durable joy, the thing that keeps us in this strange profession, one deploy night after another.


Full code: github.com/uy-td-dev/mini-dns

Related reading

Six Stations, Six Ways to Die: A Request's Journey for SurvivalSystem design
Jun 14, 202617 min

Six Stations, Six Ways to Die: A Request's Journey for Survival

Technical notes & reflection — on the road every request must travel, and why it travels that way.

Read
Nginx Isn't Acting Up — You're Just Reading the Config WrongSoftware Engineering
Jun 11, 202613 min

Nginx Isn't Acting Up — You're Just Reading the Config Wrong

There's a funny paradox among developers: we'll happily argue for a week about whether the backend should be written in Rust, Go, or Node.js — but when it's time to ship to production, 90% of us quietly type apt install nginx and stick it out front. Nginx is everyone's go-to gatekeeper — and also the thing that has people debugging until 2 a.m., still unable to figure out why one simple request keeps returning a 404. Here's the interesting part: the bug is almost never in Nginx. It's that we read the config file like a script that runs top to bottom — when that's not how Nginx works at all. This isn't a listicle of "common Nginx errors" for you to copy-paste onto your server. The goal is to hand you a mental model: to understand how Nginx thinks, so the seemingly magical traps below become predictable instead of leaving you staring blankly at the screen.

Read
Breaking the Rules Safely: When a Tech Lead Purposefully Violates the Liskov Substitution Principle (LSP)Software Engineering
May 28, 20267 min

Breaking the Rules Safely: When a Tech Lead Purposefully Violates the Liskov Substitution Principle (LSP)

SOLID is not a religion, and design principles are not immutable commandments. From the perspective of a battle-tested Tech Lead, sometimes deciding to bend the Liskov Substitution Principle (LSP) is a mature choice to keep the system alive. Let’s analyze 4 classic trade-off scenarios and the art of safely isolating the 'toxic code'.

Read