Sorry about the site being down for the last few hours.

Our cloud provider had a server stop responding. It was enough “up” that it didn’t show down from the outside.

Our cloud hosting provider is Linode. I hope you can understand just how good they are.

My initial contact was that it looked like the load balancer was failing.

They did a thorough examination and determined that it wasn’t the load balancer, but something related to our configuration.

Given that our configuration is freaking stable, this did not make any sense.

These lead me to determine that the database server was not responding. This is not good.

This in turn led me to discovering that I had pods that were “stuck”. I attempted to manual stop one of the pods, and it just hung there.

The ticket was updated to “I can’t terminate pods.” The update I received was not helpful. It was talking about disks not attaching correctly, suggesting that I was attempting to attach the same drive to multiple machines.

This took me to attempting to log into the node. Something I should never have to do.

The node was borked. I took the reset hammer to it. It rebooted. Things are working again.

Linode did a great job working for me to help resolve the issue.

Spread the love

By awa

3 thoughts on “Unplanned Downtime”
  1. With my geek hat on I do appreciate these details… Dealing with remote stuff can be mighty tricky at times.

  2. RE: Remote stuff..
    Even with multiple fiber and in house microwave paths to critical infrastructure, we still paid the geld for ‘out of band’ connections to critical equipment…

Only one rule: Don't be a dick.

This site uses Akismet to reduce spam. Learn how your comment data is processed.