Cloudflare DNS and deSEC

Hi!

I have implemented a DNSSEC monitoring solution which does a number of checks and reports and logs the results. Among others it queries 1.1.1.1 (Cloudflare DNS). This has been running for a couple of months now, so I have a fair amount of data.

Sporadically I get errors where RRSIGs can not be verified, even for the domain desec.io! I also get errors (with an ever higher frequency) for another DNS provider when using their DNSSEC solution. Over time the frequency of the errors seems to be increasing.

For example from a report dated 2021.01.12 20:00:58 +0100:

[1]: ERROR: DNSSEC verification failed for 2 TXT records using DNS resolver 1.1.1.1! For desec.io TXT (#2033)
· · ·▷ desec.io.		TXT	"v=spf1 a mx -all"
· · ·▷ desec.io.		TXT	"google-site-verification=kHvNl9DPVIQMSbpPgc-j_hZrNTYFxgEcICtgtJaogXA"
· · ·▷ desec.io.		RRSIG	TXT 8 2 900 20210121000000 20201231000000 32110 desec.io. K41jLast0ud+gc1cicxYmEj7NFjlMA7ayOVuMKu2aaxWaJHdnwBlM2mr OoNsXVdkQAJvqPlIhFXI7uREDQDqXr6EWwktLAE6/Xbhjz3oHYuRticL e/czTnqkD34hxOYtfWQ6cICB979XqKHIwfrt5GzNqxnX1LSGoD/jbteM ZwE=

The same queries to 8.8.8.8 (Google Public DNS) and to the local verifying DNS Resolver OpenBSD 6.8 unbound(8) made virtually at the same time do not result in these errors. That leaves me to conclude that Cloudflare DNS is broken w.r.t. DNSSEC.

Unfortunately I have not found any way to contact their support. If someone knows a way to contact them, please let me know.

Anyway, I wanted to let people know that at this time I can not recommend using 1.1.1.1 when using DNSSEC. Or even in general as more and more domains are DNSSEC secured and may exhibit similar problems leading to outages. Otherwise 1.1.1.1 has proven to be very fast and reliable so a deficiency like this is very unfortunate.

fiwswe

Dear fiwswe,

the signature that you show above is valid and is the one currently served by our nameservers. You can check it out with dig +dnssec TXT desec.io @ns1.desec.io, and to make sure it’s valid, the same command with @8.8.8.8, as Google Public DNS uses DNSSEC validation. (But only until the next signature rotation, which is scheduled for tomorrow, Thursday around noon UTC.)

Could you provide more information as to why your test says the signature verification fails?

A wild guess could be that other DNS responses that are needed to validate the answer got lost in transit and validation fails due to that missing information.

Best,
Nils

Hi Nils!

My mechanism uses dig(1) to do the actual DNS queries. To be precise: dig 9.10.8-P1 on OpenBSD 6.8 stable

It goes through a number of steps which I’ll leave out for brevity, at some point querying several RRsets using different resolvers.

Basically it uses several resolvers to get a validated answer, e.g. (using the same example RRset as in my original post):
/usr/bin/dig +dnssec +noall +answer +comments +nottl +noclass +rrcomments @'1.1.1.1' 'desec.io' 'TXT'

If that fails to yield an answer, i.e. if no TXT records are returned, for the targeted resolver then a recheck is done without validation:
/usr/bin/dig +dnssec +noall +answer +comments +nottl +noclass +rrcomments +cd @'1.1.1.1' 'desec.io' 'TXT'

If the recheck succeeds then the mechanism concludes that the RRSIG validation must have failed. The output of this last command is included in the failure report, thus the 2 TXT records and the RRSIG record.

Your suggestion that lost responses might be to blame is interesting. I will need to check my code to see how it would react in that case. I may need to put in some more debug code to see if that might be the issue. Still, this only happens with 1.1.1.1.

Thanks!

fiwswe

I did some testing using tshark (on the server) and Wireshark (to analyze the captured packets). I filtered the capture using port == 53 && host == 1.1.1.1 to capture all DNS traffic to and from 1.1.1.1.

I do see lost UDP packets but that seems to be handled internally by dig(1). (I see a resend of the query after ≈5s when no response is received.)

While I did not capture any more problems with desec.io queries I did see the issues with another domain. In these cases the 1.1.1.1 resolver responds with server fault (dns.flags.rcode == 2), sometimes even when the query is sent with the +cd option and for RRsets where I would expect either a validated answer or a validated negative answer (based on the NSEC3 mechanism). The DNS hoster does have some known problems with their DNSSEC implementation, e.g. CAA records are incorrectly signed, which they explained with “unsupported”.

That being said, I would have put the blame on the wonky DNSSEC implementation of the DNS hoster if not for the fact that I occasionally see the same thing with other domains like desec.io.

My best guess right now is some weird caching problem at 1.1.1.1 but that guess is not really based on any provable facts.

fiwswe