Global Record Propagation Issues?

I’m getting my X.509 certificates from Let’s Encrypt and use the DNS-01 challenge method for authentication. This has been working nicely for a few year, but I recently switched my ACME client to lego and I have occasional issues with authentication.

Here’s what I’m observing:

  1. lego sends the CSR, receives a challenge and publishes the response via the deSEC API.
  2. lego repeatedly queries the authoritative nameservers (ns1.desec.io and ns2.desec.org) until they respond with a NOERROR code.
  3. lego asks Let’s Encrypt to verify authentication.
  4. Let’s Encrypt fails with the following error:
    acme: error: 400 :: urn:ietf:params:acme:error:dns :: DNS problem: NXDOMAIN looking up TXT for _acme-challenge.example.com - check that a DNS record exists for this domain

Apparently, lego gets a valid reply from the nameservers but Let’s Encrypt does not.
I guess this has to do with record propagation in deSEC’s anycast network: Authoritative nameservers near me may have the new record while nameservers near Let’s Encrypt may not.

My questions:

  1. What would be the best way to handle this? lego does not support configuring a delay between querying the authoritative nameservers and triggering the CA to complete authorization.
  2. How log does global record propagation usually take?
  3. Can I somehow check if propagation of a newly added record has completed? I didn’t find anything in the API docs.

Hi black,

The proper way is to wait a bit longer, or retry automatically until it works.

Normally, it is just a few seconds. However, due to this issue, ad-hoc notifications to our global secondaries currently are ineffective, and our fall-back mechanism triggers all updates instead. The fall-back mechanism checks freshness once a minute, and depending on when exactly your update happened, it may be discovered right away, or after approximately one minute. There should be only very few cases which take longer.

This is currently not possible, but it seems like a reasonable feature! We’d appreciate a feature request on our GitHub.

Here’s another (not very elegant) workaround: You could delegate your _acme-challenge subdomain to another DNS provider which doesn’t have an anycast deployment. In such a setup, you will see the same state as Let’s Encrypt, so you can proceed with validation immediately once you observe that the challenge has been published. :wink:

Stay secure,
Peter

Hello Peter,

thank you, I think that helps.
If I can currently rely on one minute, I’ll just set lego’s propagation polling interval to slightly more than one minute. Seems like a reasonable and simple workaround to me.

I’m still running into this. Has there some sort of update on this?

The default lego timeout appears to be 1 minute which is enough to fail the challenge most of the time with deSEC. Not a great experience :confused:

I had to increase it to 2 minutes; I’ll have to see how reliable that is the coming months. I’ve also opened dns/desec: increase default `DESEC_PROPAGATION_TIMEOUT` by 60s · Issue #2072 · go-acme/lego · GitHub to perhaps get a workaround in lego in the mean time.