Thank you for all the suggestions I got from people that understand Linux.
Don’t click the more button unless you want to read Linux geek stuff.

(1300 words)

A bit of history on my primary computer and how we got where we are.

The primary computer has 7 disk drives and an SSD attached to it with 21 TB of storage available. The secondary has another 15 TB attached to it and an SSD. The machines work together to form a K8S cluster via VMs. In addition, they are running native ceph for a shared file system.

The primary has 64 GB of memory and 16 cores. The secondary is small at 32 GB and 8 cores.

I use ZFS as my “file system”. The root pool is rpool and is a mirror. Whenever I install or remove software, the system takes a snapshot of my rpool and saves it. As a developer, I am installing and removing software constantly. Along with keeping the system up-to-date.

This lead to rpool running out of space and making things “bad”. To fix this, I started deleting old snapshots. This caused “issues”.

The primary issue was that my boot menus, grub, have to be rebuilt every time there is a kernel change. Some of those snapshots I deleted held some configurations needed with those kernels. This caused the update-grub to fail. Which it told me, but I didn’t understand the implications.

All was good until I had to reboot.

When I rebooted, grub loaded, grub attempted to load the boot menus and there was nothing there. Grub then dropped to a command line to allow me to continue.

At that instant in time, all I needed to do was load the kernel and initrd.img and boot and everything would have been fine.

I didn’t know the commands. The things I did know worked, but the pager wasn’t on by default. All of this meant that I was on my phone looking up things and then typing them in.

When I finally got things figured out, I booted. But, I did not give the correct parameters to the kernel to allow it to mount the root file system and transfer control to the root file system.

So I’m now in “busybox”, which is an embedded Linux. I can do most of the things I need to do, except I can’t seem to mount the freaking root file system where I can access it. The issue is that ZFS has the mount point configured into the dataset.

It won’t mount the root file system because there are things already in the root file system of the busybox. More research, and I’m able to change the mount point for the root file system, mount it and then chroot into that file system.

Chroot is when you pretend that a folder/directory is the actual complete drive. Once you chroot into a directory, you can run commands as if that was the real system. Which I did. I did an update of grub and happily rebooted.

This time I didn’t get a grub menu, instead I got a reset. I was in an endless loop of reset, attempt to load from drive, fail, reset, configure all the drives, drop into BIOS. Repeat.

This leads me to reconfigure my SAS control card to bring all the drives into BIOS, so I could boot from the mirror. Which didn’t work as expected. It just continued the entire reset loop thing.

Of course, I couldn’t figure out how come I was dropping into BIOS when I wanted a boot menu.

I went to get my “magic” USB pendrives/thumb drives. I carry two of them in my pants pocket, all the time. When I was traveling to client locations. I don’t travel to client locations anymore. Guess what I can’t find?

My USB thumb drives.

I located a thumb drive. Verify that nobody claims it. Go to my secondary computer and find that it actually has photos from my wife’s trip to Greece. Save those to disk and proceed to make a boot USB. Including a bit of issues with the keyboard/mouse on the secondary computer.

I now have a bootable external drive. But that damn BIOS is giving me heartaches when I attempt to boot to the USB. I finally get that working, boot to grub on the USB. Tell it I want to try Ubuntu. It starts the boot process and stops, hangs.

It takes a while to move 5GB off or on a USB drive, so I decided to wait it out. 30 minutes later, I gave up on that. Fight my way back to the grub menu and selected “safe graphics”, which works.

Thank goodness I’m not stuck in 640×480, the “safe graphics” actually give me full use of a screen, echoed on both screens. SIGH.

I install the boot-rescue software, run it, one click it. It chooses to fix the boot process on one of my ZFS volumes, which I use for my virtual machines.

I can’t get it to “fix” the actual, real device. Anger is building.

Meanwhile, I’ve taken time out to write my article for Thursday, on my secondary computer.

At 0300, I call it a day and head off. Some time in this process, I realized that I have a working grub, booting of the USB, I can use that to access my real drives!

Thursday I start that process. I finally figured out that I wasn’t giving the correct parameters to the kernel when booting. I had to tell it what drive contains the root file system. Once I did that, it booted … into busybox, again.

I’m reading the error messages. Busybox is attempting to mount the root file system at /root/mnt. It is failing. I fix things and make it work. Tell busybox to continue. The system resets.

Multiple attempts and it won’t boot.

I’m frustrated, the pager (less/more) is not working. Printenv doesn’t exist. I finally remembered the env command to see what my variables are.

And there I find a variable named “REASON” that contains the reason that the system ended up in busybox.

A few hours later, it hits me. There is no reason for the root file system to be mounted on /root/mnt. It should mount on /root. Why is it mounting on /root/mnt instead?

Then it hit me. The day before, I had changed the mount point from / to /mnt. What happens if I change it back?

Now I can boot to grub, manually load the kernel and initrd.img and finish booting into Linux running on my system.

“Its alive!”

Except my video drivers are open-source, and I need to change them to the Nvidia drivers. Which isn’t working because of a dependency issue. Multiple hours later, that is fixed. I need to reboot.

I’ve resolved the issues I had with dpkg –configure -a breaking. There is a good grub.cfg, all I need is to load that particular grub.cfg and the world will be better.

The system immediately drops into that reset loop.

More research. It turns out that update-grub doesn’t update grub. It just recreates the grub.cfg file. I run grub-install and it does something. I don’t think it works.

I attempted to make a grub rescue image. That works, but I can’t write it to the USB device. The world hates me.

I tried something else, start the reboot. And miss the boot from USB.

The system comes up, by itself.

Things are alive now. There is still an issue, but I’m slowly fixing things. My English lesson with custom software worked as required. I can actually do work now.

Spread the love

By awa

4 thoughts on “Geeking about not-booting”
  1. Maybe yall can answer a knuckle dragging question for me- why whenever a program or app gets “updated” does it ALWAYS get more difficult to F’in use?? Whether its work related or fun related, every time it gets worse..

    1. I don’t see that happen. I see things get better. Unfortunately, there are times when things get ‘worse’ but that is only in comparison to other parts.
      In my example, I created a website for a client. They have a particular way of selling goods. One of the things that happens is that they will often have an order with a 500 or more unique items on the order. The original software only expected around 50 max. This means that when they get to these large orders, things will sometimes hang.
      They complained and complained and complained. They bitched about dozens of things that “didn’t work right!”. I did my best to make them satisfied.
      I broke down and started bitching at my direct client who sold the system. What he told me was shocking. They loved the new system. They were able to do more, with less effort, on a more responsive system than they could ever have done with their old system. Two years later, they are still breaking the system as they find new ways of doing things and stressing the framework far beyond what it was designed for.

  2. The joys of zfs import. It’s also a real pain if you have to access the data from another device – ZFS wasn’t intended for network use, and doesn’t scale to SANs.

    That said, for a workstation level copy-on-write filesystem it’s not bad at all, but it’s still very much a 2009 era design.

Only one rule: Don't be a dick.

This site uses Akismet to reduce spam. Learn how your comment data is processed.