Inconsistent propagation between authoritative nameservers during ACME DNS-01 challenge

Hello deSEC Team,

I am experiencing a reproducible issue when obtaining Let’s Encrypt certificates via ACME DNS-01 using a deSEC-managed zone.

Environment

  • DNS provider: deSEC

  • Domain: example.com

  • ACME clients tested:

    • Caddy 2.11.4 (caddy-dns/desec v1.1.0)

    • lego

The issue occurs with both clients, so it does not appear to be specific to Caddy.

Observed behaviour

Both ACME clients successfully create the TXT record via the deSEC API.

However, the two authoritative nameservers appear to become inconsistent for some time afterward.

Immediately after the TXT record is created:

dig @ns1.desec.io TXT _acme-challenge.immich.nas.example.com +short

returns the expected challenge token, whereas

dig @ns2.desec.org TXT _acme-challenge.immich.nas.example.com +short

returns no record (NXDOMAIN).

Because lego verifies propagation against the authoritative nameservers before proceeding, it eventually fails with:

NS ns2.desec.org. returned NXDOMAIN for _acme-challenge.immich.nas.example.com

Second ACME attempt

If I immediately retry the certificate request, the behavior changes:

  • ns1.desec.io already serves the new challenge token.

  • ns2.desec.org now serves the previous challenge token.

Let’s Encrypt then fails with:

During secondary validation:
Incorrect TXT record "<old token>" found at
_acme-challenge.immich.nas.example.com

This behavior is reproducible.

Conclusion

It appears that TXT record updates become visible on ns1.desec.io significantly earlier than on ns2.desec.org.

During ACME DNS-01 validation, this results in inconsistent answers from the authoritative nameservers:

  1. First attempt:

    • ns1.desec.io → new token

    • ns2.desec.org → NXDOMAIN

  2. Second attempt:

    • ns1.desec.io → new token

    • ns2.desec.org → previous token

Since the issue can be reproduced with both Caddy and lego, it does not appear to be client-specific.

Could you please investigate whether there is a replication or synchronization delay between the authoritative nameservers?

Thank you very much in advance.

Best regards,

Philip

Hi Philip,

Thanks for your message, and welcome to deSEC! :slight_smile:

ns1 and ns2 do not use different software, so the behavior you’re observing likely is a network connectivity issue etc.

Note that ns1 and ns2 each are 7 or 8 servers. It’s possible this is happening at only one of the 15 locations total, which happens to be the one you see as ns2.

There is no technical reason why ns2 should lag one update behind ns1. Rather, the issue seems to be that the ns2 instance you’re hitting is catching up less quickly than other instances. If you wait a little longer, they should be in sync.

We’re aware of suboptimal replication timing, and are planning to improve the situation by replacing our replication mechanism (but it will not be done tomorrow).

Stay secure,
Peter

2 Likes

Hi Peter,

Thanks for your quick and detailed response and for welcoming me to deSEC. :slight_smile:

That makes sense and also matches what I observed during testing. Shortly after my post, the propagation had completed, and the ACME validation succeeded (it had consistently failed all day before). Eventually, I successfully obtained the certificate!

My takeaway is that in such situations, it is best to let the ACME client continue retrying automatically rather than triggering new validation attempts. The authoritative nameservers will converge over time, and the validation will succeed without further intervention.

Thank you also for the information about the planned improvements to the replication mechanism.

Best regards,

Philip