Using desec with lego/traefik suddenly doesnt work anymore

Freundschaft · January 30, 2022, 4:00pm

Hello there,

I am using desec with lego/traefik in a kuberentes cluster.
It was working fine until recently, but now domains arent able to pass validation, I see the follwing output on 2022/01/30 15:39:08 client.go:551: [DEBUG] GET https://desec.io/api/v1/domains/e - Pastebin.com (cant post here since forum regognizes log entries as links)

which looks like the API calls are working correctly, but also I never see the entries created in the desec dashboard.

anyone else seeing this issue at the moment?

black · January 31, 2022, 7:55am

Disclaimer: I use lego, but not treafik. I believe, only lego is relevant here.

Are you sure, that the TXT record is not created? Lego cleans it up after the (failed) verification step, so you’d have to poll quickly to see the record at all.
If the record is created (or you’re not quite sure if it is), this sound like an issue I had a while ago.

The gist of it: After setting up the response TXT record, lego polls the authoritative nameserver(s) until they have it. Lego then tells the CA to do the DNS verification. The CA queries the authoritative nameserver(s) for the same information.
In that last step, the CA may talk to other servers than lego did, due to deSEC’s global anycast network.
So, if you’re unlucky, the newly created record propagates faster to the servers near your lego instance than to those near the CA’s verifier.
Apparently, propagation may take up to one minute.

My solution was to make lego increase the interval between creating the TXT record and the first time polling for it to be more than one minute. You can probably do this with DESEC_POLLING_INTERVAL (see docs), unless traefik does something wild here. You may have to increase DESEC_PROPAGATION_TIMEOUT as well.

Hope that helps. Your issue may be something totally different.

nils · January 31, 2022, 11:29am

Hi Freundschaft,

I second black’s opinion. Due to different locations in the Internet, you and Let’s Encrypt may be talking to different name servers. deSEC’s propagation to the frontend name servers is currently slightly delayed due to a bug in the powerDNS lmdb backend. We are working with the powerDNS maintainers towards a solution.

Increasing the waiting time should resolve your problem in the meantime. Sorry for the inconvenience!

Best,
Nils

Freundschaft · January 31, 2022, 4:53pm

yes that sounds very much like the issue at hand here.
The API calls appear to go through so I also assume its a DNS propagation issue.

Thanks & Best Regards

g5pw · November 20, 2023, 11:56am

I’m still having the same (similar?) issue, was this solved? @black which values did you use for DESEC_PROPAGATION_TIMEOUT and DESEC_POLLING_INTERVAL?

I’ve set

      environment:
        - DESEC_PROPAGATION_TIMEOUT=240s
        - DESEC_POLLING_INTERVAL=120s

which looks pretty large to me, but it’s still not working…

black · November 21, 2023, 4:57pm

I use 75 seconds for the polling interval and 300 seconds for the propagation timeout.
I’m not sure the high timeout value is even necessary. Do you have lego’s output when it fails?

g5pw · November 21, 2023, 8:21pm

The relevant log seems to be

time="2023-11-21T11:42:18Z" level=error msg="Error renewing certificate from LE: {example.com []}" ACME CA="https://acme-v02.api.letsencrypt.org/directory" providerName=letsencrypt.acme error="error: one or more domains had a problem:\n[sub.example.com] time limit exceeded: last error: NS ns1.desec.io. did not return the expected TXT record [fqdn: _acme-challenge.sub.example.com., value: <hash>]: <other hash>\n"

black · November 24, 2023, 9:51pm

That does not seem to be a propagation issue in your case.
From what I understand, Lego roughly does the following steps:

Request a challenge from the CA
Set up the response DNS record
Poll the authoritative DNS server(s) until they serve the response
Tell the CA to verify the response

My (and Freundschaft’s, I suppose) issue, was that step 3 completed too soon (see other thread for the error message).
You error message (time limit exceeded: last error: NS ns1.desec.io. did not return the expected TXT record) suggest that step 3 does not complete at all (before the timeout).

I can think of two possible causes:

Step 2 failed (but Lego didn’t notice). You should try querying the API and the deSEC name servers while Lego is running to see if the response is correctly set up and served.
Step 2 succeeded, but Lego can not reach the deSEC name servers and therefore fails in step 3. If your firewall blocks DNS traffic that does not go to your preferred (stub) resolver, this may be your issue.