On Wed, Feb 21, the site experienced downtime.  The cause was an infrastructure update that caused certain resources to go offline.

At no time was there a risk to the contents of the site.

The resources (ceph cluster) has been brought back online and the site, obviously, is now functioning properly.

Geek speak:

The base infrastructure is K8S. A ceph cluster provides backing storage for the RDBMS and for the assets. At around 0800, the K8S cluster was forcibly upgraded because of EOL issues.

This caused NAS volumes to become detached from the K8S ceph nodes. This is expected. Once the volumes were attached to the new K8S ceph nodes, the OSD processes had to be properly restarted.

Once this was completed, the ceph volumes became available to all the pods that needed them and the site was brought back up.

Spread the love

By awa

5 thoughts on “Server down”

Only one rule: Don't be a dick.

This site uses Akismet to reduce spam. Learn how your comment data is processed.