Servers Dying (Sept 30, 2021)
On September 30, 2021 the LetsEncrypt root certificate named DST Root CA X3 expired (or rather not to be used after that date). This meant that any certificate signed from that chain was no longer trusted.
Root certificates are quite an interesting thing. They are SSL certificates signed by no one, but you implicitly trust them since your software and the company behind that software trusts it. This isn't something taken lightly as many processes and regulations exist to the companies that hold this burden.
It even continually evolves with more guidelines every few years with a current twist that root certificates will only be valid for a max of 25 years. Most certificates were started in 1999/2000 so we are entering the decade of a ton of certificates expiring.
This shouldn't be a problem in most cases. If you have an up to date machine and hardware - updates received even years prior would have prepared you for a root certificate expiration. You can even peek these yourself on Windows with the Certificate Manager.
Shown above is the now expired LetsEncrypt DST Root CA X3 certificate. Thankfully, it doesn't take long for me to find the not expired newer LetsEncrypt root certificate.
So my machine should be in wonderful shape since any certificate signed in the last 18 months probably used the newer ISRG Root X1 instead of the old. There are entire blogs about this that go into cross signing and how new Certificate Authorities (CA) enter the game. I'd visit Scott Helme's blog if you want to dive further into that aspect of the certificate ecosystem.
Though what if your device is really old and doesn't get auto updates anymore. This may be a really old version of Java or an old Blackberry or Android device. You won't have an update to bring new root certificates into play and websites/services that are secured with newer certificates would be no longer accessible. The list of affected devices/things for LetsEncrypt is entirely documented if curious.
Though this blog post is about why some of my servers had a serious issue.
On September 30, 2021 alarms starting going off some hobby sites I run that APIs to other services for information were failing. Jumping onto the server I tested my own site and got:
[ec2-user@ip-xxx-xx-xx-xxx ~]$ curl -I https://connortumbleson.com curl: (60) SSL certificate problem: certificate has expired More details here: https://curl.haxx.se/docs/sslcerts.html
This scared me because I thought my certificate expired, but it was a reversal. The machine root certificate used for this verification was expired and thus rejecting all services that used a related LetsEncrypt certificate.
I was beyond confused. I was running Amazon Linux 2, which is basically marketed as the #1 AMI to base your server off of when using EC2.
I have auto updates enabled to keep things up to date. I saw these blogs/announcements of LetsEncrypt changes coming, but I made a mistake and figured I was immune to the problem. If you run the latest and greatest AWS stuff and keep things up to date, I improperly thought I was covered.
I was trying to figure out what was wrong. I dug into the location of my certificates
/usr/share/pki/ca-trust-source and found both the old (DST Root X3) and new (ISRG Root X1) certificate in play.
I kept researching things, because I assumed root certificates expired all the time so why didn't the new one win in priority over the expired one?
I didn't learn that answer until after my "emergency" fix. In the moment, I manually edited out the old expired certificate from my
ca.bundle.crt file and regenerated everything.
This was great because everything was working amazingly and it was off to research what went wrong. I quickly learned that this situation should not have been an issue (an expired and not expired cert) in the trust store.
However, it took until OpenSSL 1.1.0 to resolve that. Guess what AMI2 runs by default.
[root@ip-xxx-xx-xx-xxx ec2-user]# openssl version OpenSSL 1.0.2k-fips 26 Jan 2017
This is probably because Amazon Linux 2 (and 1) follow closely to the Red Hat 6/7 system. They take stability and security over everything no matter what. So while the version they run seems quite old (1.0.2), they back port tons of security related fixes to that build.
Since your system has ISRG Root X1 in its trust store, OpenSSL should simply ignore the cross-signed version of it (signed by DST Root CA X3) that shows up in certificate chains. However, OpenSSL 1.0.2 doesn't properly ignore the cross-signature. Instead it throws an error! That behavior is fixed in OpenSSL 1.1.0.
jsha @ LetsEncrypt
So if OpenSSL 1.1.0 only fixes a bug, then it was never included. Despite that bug wrecking havoc for millions on September 30, 2021.
Now I found a few things from Amazon, they released a package called
openssl11 that adds the v1.1.0 version, but it does not link behind the scenes to software.
The update in the package repository does not replace the default OpenSSL (openssl 1.0.2k) in Amazon Linux 2, and currently no other package in the repositories, such as httpd, nginx, links with openssl11.
So I must not understand enough about the linking of software since I'm not sure who would install this if I have to relink all my software to that version. Since I don't think installing OpenSSL1.1.1 through that source will magically fix my nginx/php/curl issues.
Am I wrong to think that the most marketed Amazon option for EC2 should just work? It took Amazon roughly 9 hours to release a package update to
ca-certificates to remove the expired cert. This meant that the OpenSSL bug fix between 1.0 and 1.1 was no longer needed as only relevant certificate remaining in the trust store was no longer expired.
Did they have to wait until after everything broke? It seems like if you are removing a certificate 1 day prior to the expiration seems fine by me. However, I can see some folks not agreeing with that.
I can see many many certificates expiring in the 2020-2030 slot and I'm not ready for another dose of stress on a weekday.
TLDR; My Amazon Linux servers broke connectivity on September 30, 2021 and I either had to wait for an Amazon update to
ca-certificates or surgically remove an expired certificate from my trust store.