Ns1.desec.io replication issues

Hi!

Currently I’m experiencing ns1.desec.io and ns2.desec.org out of sync for quite a while. I updated IPv4 and IPv6 with dyndns, the correct records are showing on ns2.desec.org but ns1.desec.io does not seem to be updated correctly. Googles name server on 8.8.8.8 answers sometimes with the correct records, sometimes with the old ones, and sometimes with a mix of the two on IPv4 and IPv6. Here is an example (IPs and domain redacted for privacy, I will provide them in direct message if necessary).

Correct IPs: <redacted>.244.210 and <redacted>:7f03:<redacted>; Outdated IPs: <redacted>.81.163 and <redacted>:7f0e:<redacted>.

root@OpenWrt:~# nslookup <my-domain>.de ns2.desec.org
Server:         ns2.desec.org
Address:        2607:f740:e00a:deec::2#53

Name:   <my-domain>.de
Address: <redacted>.244.210
Name:   <my-domain>.de
Address: <redacted>:7f03:<redacted>

root@OpenWrt:~# nslookup <my-domain>.de ns1.desec.io
Server:         ns1.desec.io
Address:        2607:f740:e633:deec::2#53

Name:   <my-domain>.de
Address: <redacted>.81.163
Name:   <my-domain>.de
Address: <redacted>:7f0e:<redacted>

root@OpenWrt:~# nslookup <my-domain>.de 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:   <my-domain>.de
Address: <redacted>.244.210
Name:  <my-domain>.de
Address: <redacted>:7f0e:<redacted>

The web interface shows me the correct IP addresses.

Cheers!

Hi,

the same issue seems to exist for acme challenge. ns2.desec.org answers correctly and quickly. Unfortunately ns1.desec.io does not seem to get updated with the correct entry.

someOne@server:~$ dig TXT @ns2.desec.org _acme-challenge.<my-domain>
;; BADCOOKIE, retrying.

; <<>> DiG 9.16.1-Ubuntu <<>> TXT @ns2.desec.org _acme-challenge.<my-domain>
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43452
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1400
; COOKIE: (good)
;; QUESTION SECTION:
;_acme-challenge.<my-domain> IN        TXT

;; ANSWER SECTION:
_acme-challenge.<my-domain> 3600 IN TXT "<the-acme-challenge>"

;; Query time: 0 msec
;; SERVER: 157.53.224.1#53(157.53.224.1)
;; WHEN: So Jan 07 17:31:19 CET 2024
;; MSG SIZE  rcvd: 148

someOne@server:~$ dig TXT @ns1.desec.io _acme-challenge.<my-domain>
;; BADCOOKIE, retrying.

; <<>> DiG 9.16.1-Ubuntu <<>> TXT @ns1.desec.io _acme-challenge.<my-domain>
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 41658
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1400
; COOKIE: (good)
;; QUESTION SECTION:
;_acme-challenge.<my-domain> IN        TXT

;; AUTHORITY SECTION:
<my-domain>.          300     IN      SOA     get.desec.io. get.desec.io. 2023125320 86400 3600 2419200 3600

;; Query time: 4 msec
;; SERVER: 45.54.76.1#53(45.54.76.1)
;; WHEN: So Jan 07 17:31:31 CET 2024
;; MSG SIZE  rcvd: 140

It might be a regional problem. DeSec routes requests to the closest server, and when I look up my domain on https://mxtoolbox.com, the answer from ns1.desec.io is correct. Queried from Germany, however, ns1.desec.io still serves the wrong records.
Maybe it is only the Frankfurt server which is out of sync.

Currently all is back to normal. Let’s hope it was only a one-time glitch.

Hi all,

this is my first post to this community – sorry that I start my contribution with a problem report.

I experience the same problems described above. A MX record for one of my domains (created 16 hours ago) is available at ns2.desec.org, but unavailable at ns1.desec.io.

I do my tests from the Munich region.

Edit:

Updating the MX record does not solve the issue, the updated record is available at ns2, but ns1 does not show the record at all.

All tests are done with dig, like so:

dig @ns2.desec.org mydomain.example.org MX

Hi,

Thank you for your report. Indeed, we are currently experiencing some replication problems. One aspect of our replication works by using an interface of the namseserver software (PowerDNS) to get the list of domains and their current state, both on each location and on the primary instance. We then compute a diff and initiate synchronization for those domains that are part of the diff.

For some reason, the PowerDNS interface used to collect the list of domains (and state) sometimes freezes, leading to a timeout in the replication system. We haven’t been able to identify the root cause yet. The problem indeed primarily affects out ns1 instance in Frankfurt.

It’s a bit confusing, because other locations which run the same software and version are not affected.

In any case, we made some adjustments, which we hope will improve things. We expect to fully address the problem over the next few days.

Thank you for understanding!

Stay secure,
Peter

4 Likes

Hi Peter,
thanks for the explanation of the problem. This already helps a lot.
Staying secure :slight_smile:
Hans

1 Like

Currently ns1 is out of sync again. As a workaround I blocked the ns1 IPs in my home network’s recursive resolver (relying solely on ns2). Now at least all my local devices avoid random connection issues whenever ns1 happens to be asked. But of course this is not a satisfactory solution.

Any updates on fixing the underlying issue with the replication?

Hi,
I’m also experiencing issues since about 24 hrs in the way that although my current IPv4 got recognized for my domains in the desec domain mgmt, however, when doing some DNS lookup, I get returned the previous (obsolete) IPv4 as a reply.
Hence my internal services are not reachable from the internet.

Hope you can fix it quite soon.

Thanks a lot.

I’m considering to actively use desec.io for several domains, but wondering about the above. Has the root cause ever been found @peter ?

HI Maarten,

Thanks for your message, and welcome to deSEC. :slight_smile:

I have to admit that we did not identify the root cause, but we also did not chase it much. Instead, we managed to fix it by making adjustments to some config settings, and then spent our energy on migrating half of our instances to Knot DNS, so that the deployment is now diversified between both PowerDNS and Knot DNS.

We have not experienced the above problem since it happened in January.

Stay secure,
Peter

3 Likes

Hello,

this appears to still be an issue. I too tried to migrate some domains to deSEC, but notice DNS-01 ACME challenges to fail. The added TXT record tends to only be present on one of ns1.desec.io or ns2.desec.org, while the other returns NXDOMAIN even after ~1 minute of retries. But, it can also be reproduced manually - for example, I just added test.ehok.at TXT "test", and at first only get a response when querying ns1.desec.io - only after several attempts ns2.desec.org started to return the value.

Hi @georg,

Thanks for your message, and welcome to deSEC! :slight_smile:

Yes, unfortunately our replication system currently isn’t always real-time, as a result of the fact that sometimes a lot of concurrent updates happen. We’re starting some work to refactor how our primary delivers zone updates, and expect that to fix the issue some time during the next weeks.

Stay secure,
Peter

1 Like

Hi @peter,

thanks for your input! Nice to hear there are improvements on the way. Would also be curious to read about the technical design/implementation as the setup changes (I see there is some custom replication script and you mentioned migrating parts to Knot), but I get it might be more complex. :slight_smile:

Is there maybe an expected time how long replication is supposed to take, so one could, as a workaround, simply configure the client software with a respective timeout or sleep value, or is it an “unknown” delay with the current setup?

Things always should converge within a few minutes max. Ideally, we’d like publication of changes within a few seconds, but … well. Let’s see if we can achieve that.

Stay secure,
Peter

1 Like

I’m using 3 minutes and don’t have any issues with my ACME challenges. With 2 minutes, I experienced occasional failures. I don’t know if that still applies, though. It’s been quite a while since I switches to the longer delay.

I had issues with 2 minutes too, and set it to 4 minutes and didn’t had any issues since.
But one could query all instances, and see if it’s really updated:

curl -sSfL https://github.com/desec-io/desec-automation/raw/refs/heads/main/hosts/all.yml \
| yq -r '.all.children.frontends.children|map(.children|map(.hosts|keys))|flatten[]'

Prepare an update (onverwijld is a Dutch word, meaning immediately):

curl -X PATCH https://desec.io/api/v1/domains/onverwijld.nl/rrsets/test/TXT/ \
 -H "Authorization: Token $DESEC_TOKEN" --json "$(jq -n '{records:["\"bob\""]}')" \
| jq
{
  "created": "2026-06-05T12:33:23.315381Z",
  "domain": "onverwijld.nl",
  "subname": "test",
  "name": "test.onverwijld.nl.",
  "records": [
    "\"bob\""
  ],
  "ttl": 3600,
  "type": "TXT",
  "touched": "2026-06-05T12:36:57.643459Z"
}

While monitoring all servers (explicitly over IPv4 since all IPv6’s of .c.desec.io are not working for me):

while true; do
  sleep 10
  date +%T
  echo @{{ams,dfw,sao,fra,hkg,jnb,syd}-1.a,{dxb,fra,lax,sin,lga,lhr,scl,tyo}-1.c}.desec.io \
  | xargs -P15 -n1 dig -4 +short TXT test.onverwijld.nl \
  | sort \
  | uniq -c
done
14:36:48
     15 "alice"
14:36:59
     15 "alice"
14:37:09
     15 "alice"
14:37:20
      9 "alice"
      6 "bob"
14:37:30
      9 "alice"
      6 "bob"
14:37:41
      9 "alice"
      6 "bob"
14:37:52
      9 "alice"
      6 "bob"
14:38:02
      9 "alice"
      6 "bob"
14:38:13
      9 "alice"
      6 "bob"
14:38:23
      7 "alice"
      8 "bob"
14:38:34
      7 "alice"
      8 "bob"
14:38:44
      7 "alice"
      8 "bob"
14:38:55
      7 "alice"
      8 "bob"
14:39:06
      7 "alice"
      8 "bob"
14:39:16
      4 "alice"
     11 "bob"
14:39:27
     15 "bob"

So in this test case it took from 12:36:57 to somewhere between 14:39:16 and 14:39:27 for all servers to in sync, which is 139–150 seconds for this n=1 case. A second test took from 13:02:22 till 15:04:09–15:04:19, so that was 107–117 seconds, within 2 minutes.

Note: of course if draft-ietf-acme-dns-persist-01 - Automated Certificate Management Environment (ACME) Challenge for Persistent DNS TXT Record Validation is implemented (see DNS-PERSIST-01: A New Model for DNS-based Challenge Validation - Let's Encrypt), then instead of updating _acme-challenge a static record can be used:

_validation-persist.example.com. IN TXT ("authority.example;"
   " accounturi=https://ca.example/acct/123")
1 Like