When things work?
Geek dump, run away, run away!
If I was a normal person, much of the software and hardware issues I run into would just go away.
Instead, I invite these disasters on myself.
In a normal business or home, you would have computers for each of your workers. You would have some virtual machines on the cloud.
The work computers would be reasonably fast, have reasonable graphics, reasonable memory, and reasonable amounts of disk space.
The idea is to give your workers the right equipment to get out of their way.
My life isn’t like that. I have multiple virtual servers in multiple states from different providers. All of which need to be monitored.
At the office, my computer has good graphics, unreasonable amounts of memory, very good CPUs, and unreasonable amounts of disk space.
There are two more machines that are powerful enough to be considered “servers”.
Nobody really needs servers in the office space. I do things that make it reasonable to have those servers.
I run multiple virtual machines for testing purposes. Occasionally, it is just easier to toss up a new virtual machine than to try to run it on my standalone servers. I have a cluster of virtual machines running as a K8S cluster. There is a ceph cluster running to provide a distributed multipoint mounting system for those virtual machines.
So I installed Zabbix on a virtual machine. That virtual machine uses a ceph file system. This means that I can migrate that machine to any of my servers that has access to that ceph cluster. Which is very neat.
I had to learn how to write Zabbix templates to add monitoring of Amanda backup sets. Just got that working a week or so ago. And it has already paid off.
As I figured some of this stuff out, I added more and more of my servers and client servers to the monitoring load.
One of the things I added was disk hardware monitoring via ‘SMART’.
S.M.A.R.T. then told me that I had a drive that was running hot. Then it gave me a warning that a drive was failing.
Yesterday that drive failed. That drive failure affected my ceph cluster. Now, I found this out when I got alarms from Zabbix.
Meanwhile, I’m trying to fix my technical drawings. I was using the latest, greatest, version of FreeCAD. I was running it from a “snap”. Turns out that snaps run in a confinement. This means that I couldn’t get it to run any outside programs that were not approved. That was two+ days chasing my tail.
This caused me to revert to the stable release of FreeCAD. Which reset all of my settings. Which added hours to redoing parts of my blueprint.
FreeCAD can record a frame for each step change in a variable. That’s cool. I’ve made a couple of short animations using it. But what I really want to do is to have it send each step to a ray tracer. Which I’ve not figured out how to do yet.
So it is now 0115, I have almost recovered from the bad drive issue. I’m just waiting a bit longer to be able to reboot that server and take the drive out. I can’t do that yet because ceph wants that server to be up for a bit longer.
If I get very very lucky, I’ll be done with this shortly.