The alarm went off, I opened my blurry eyes and reached for my phone. Click… click… 03?!?!!?
I started looking. I log into my server from my phone, clicking away to get a status. The database engine is in a crashbackoff loop.
About that time, I noticed that Miguel had contacted me with a very polite 503? Whiskey Tango Foxtrot.
As I have talked about, I’m upgrading the infrastructure that GFZ uses. The previous round of downtime resulted in me opening tickets with Linode and escalating to the point where less than a week ago I got an update, “We resolved the issue you reported”. They had known about the issue for over a year. It just wasn’t important enough to fix until their client, me, raised a fuss.
One of the side effects of this upgrade process is that I’ve had to increase the number of nodes and the size of nodes. All of that is going well.
It is unclear to me why the database engine crashed, only that it did.
To that end, I have removed that database engine from production. Moved all the data to the larger, more stable, database engine. This database engine is using the new persistent (CEPH) storage engine. While it is not “crash proof” it is less prone to failures because of the way the data is now stored.
In addition, it is much easier to get backups of the data.
I’m going to take the plunge later today and move the assets from the storage it is currently using to the new storage system. This offers numerous benefits, not the least of which is that I can do rolling upgrades of the software.
Yesterday I upgraded ‘WordPress’ on multiple sites. With the new infrastructure being used by some of those sites, there was zero downtime. K8S started a new pod with the new software. When it was stable, it terminated one of the old pods. It then started a second pod with the new software. When it was stable, it terminated the last old pod. Zero downtime.
For GFZ, using the older infrastructure, the old pod was terminated, the new pod was started, once it was stable, service resumed.
Regardless, I’m hopping for a quiet day.