How We Built ZoneWatcher's Record Autodiscovery

This sounds simple. The implementation wasn't. Here's how we built it and what we learned along the way.

The Problem

DNS management at scale has a bootstrapping problem. Before you can monitor domains, you need to know what domains you have. For a small team with three domains, this is trivial. For an MSP managing 200 client domains across four providers, it's a real obstacle.

We could have shipped a "paste your domain list here" text box and called it done. But that misses the domains you've forgotten about, which are often the most vulnerable ones. Forgotten domains mean forgotten DNS records, which mean unmonitored attack surface.

The goal was zero-configuration onboarding: connect your provider, see everything.

Provider API Landscape

Every DNS provider has its own API, its own authentication mechanism, its own data format, and its own quirks. There's no standard. Here's a sample of what we deal with:

Authentication varies widely. Some providers use API keys. Others use OAuth tokens. Some require account-scoped keys, while others support zone-scoped tokens with granular permissions. A few still use username/password authentication for their API.

Zone listing isn't always straightforward. Some APIs return all zones in a single paginated response. Others require you to query each zone individually. Some return zones that are parked, suspended, or pending verification, states that don't make sense to monitor.

Record formats differ. An MX record's priority field might be a separate attribute on one provider and embedded in the value string on another. TXT records might be a single string or an array of 255-byte chunks. CNAME targets may or may not include a trailing dot. SOA serial number formatting varies.

Rate limits are everywhere. Every provider has rate limits, but the limits, enforcement mechanism, and response codes vary. Some return HTTP 429. Some return 200 with an error body. Some silently drop requests.

We built an abstraction layer that normalizes all of this. Each provider integration translates the provider-specific API into a common internal representation: zones as a flat list with metadata, records as a uniform schema.

The Discovery Process

When you connect a provider, ZoneWatcher runs through this sequence:

1. Enumerate zones. We query the provider's API for all zones associated with the authenticated account. This includes active zones, and we filter out parked or suspended ones that can't serve DNS.

2. Fetch all records for each zone. For every active zone, we pull the complete record set. This is the expensive step; for an account with 200 zones, we're making 200+ API calls, each potentially paginated.

3. Normalize records. Each record is converted to our internal format. We handle the provider-specific quirks here: splitting concatenated TXT records, normalizing trailing dots, extracting priority from MX values, and so on.

4. Store the baseline snapshot. The initial record set becomes the monitoring baseline. Every subsequent check compares against this snapshot.

5. Set up polling. The domain enters the monitoring rotation. ZoneWatcher periodically re-fetches records and diffs them against the last known state.

For public DNS monitoring (domains without a provider API connection), the process is simpler: we query the domain's authoritative nameservers directly using standard DNS queries. No API is needed, but we have less visibility; we only see what's publicly queryable, not internal or unpublished records.

Handling Provider Limits

The biggest engineering challenge wasn't parsing different API formats; it was being a good API citizen while monitoring thousands of zones across dozens of accounts.

Rate limiting with back-pressure. Each provider integration has its own rate limiter that respects the provider's documented limits. When we hit a rate limit, we back off exponentially and retry. For providers with shared rate limit pools (where all ZoneWatcher customers share a single limit), we coordinate across accounts.

Efficient polling. We don't re-fetch every record on every check. We use conditional requests (ETags, If-Modified-Since) where supported. For providers that support zone change notifications or webhooks, we listen for those instead of polling.

Staggered scheduling. We don't check all zones simultaneously. Checks are distributed over time to smooth out API load. A provider with aggressive rate limits might have its zones checked on a longer interval than one with generous limits.

What We Got Wrong Initially

Assuming consistent record ordering. Our first diff implementation compared records by position. Provider A returned records in alphabetical order. Provider B returned them in creation order. Same zone, different order meant false positives everywhere. We switched to set-based comparison keyed on (name, type, value) tuples.

Ignoring SOA serial number behavior. SOA serial numbers change frequently, sometimes on every API call. Early versions flagged every SOA change as noteworthy. We learned to treat SOA serial changes as expected noise unless other fields in the SOA record also changed.

Underestimating TXT record fragmentation. DNS limits a single string within a TXT record to 255 bytes. Long values (like DKIM keys) are split across multiple strings. Some providers return the split strings separately, while others concatenate them. Some APIs escape certain characters differently. Getting consistent TXT comparison right took more iterations than we'd like to admit.

What Autodiscovery Enables

Beyond the convenience of not typing in domains manually, autodiscovery provides:

Complete coverage. You can't forget to monitor a domain if it's discovered automatically. When someone on your team creates a new zone, ZoneWatcher picks it up on the next discovery cycle. This helps developers and system administrators avoid gaps in coverage.

Ongoing discovery. Autodiscovery isn't a one-time import. It runs periodically. New zones are detected and added to monitoring automatically. Deleted zones are flagged.

Provider migration support. When you migrate a domain from one provider to another, autodiscovery on the new provider picks it up. Combined with the existing monitoring on the old provider, you get visibility into the transition. This is particularly valuable for managed service providers.

The unglamorous work of normalizing APIs, handling rate limits, and parsing edge cases is what makes the experience feel seamless. You connect a provider, your domains appear, and monitoring starts. The complexity is invisible, which is the point.