Udev and the broken system start

I recently upgraded my NixOS system for kodi. For all my system deployment I use deploy.rs. It just works and is a breeze to work with. So anyway, I just upgraded to a newer nixpkgs version and didn’t do any configuration change. Usually totally boring.

Not this time.

The deployment and activation script worked without issues. Some days later I needed to restart the system. After I issued sudo reboot I couldn’t login via ssh anymore. What happened? Well, time to turn on the screen.

I was greeted with

A start job is running for /dev/disk/by-id/usb-WD_Elements_25A3_574343374B364C594C59454C-0:0-part1

Which was followed by

[Time] Timed out waiting for device dev-disk-by.(...)
[Depend] Dependency failed for /mnt/extern
Dependency failed
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or ^D to try again to boot into default mode.

Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.
Press Enter to continue

Pressing Enter doesn’t help, and neither does rebooting.

Well, and now? Unfortunately I garbage collected in between, so there isn’t an old version to boot into. Otherwise this would be the thing to do.

Well, lets first figure out what the actual problem is. All the USB connectors work, so lets connect the external drive to another pc and see whether this works.

It does. But on my laptop (which runs the new and updated nixpkgs version, too), I don’t see dev/disk/by-id/usb-WD_Elements_25A3_574343374B364C594C59454C-0:0 anymore, but /dev/disk/by-id/wwn-0x50014ee20e123456. So where did the symlink in dev/disk/by-id/ go?

To make it short: The update of systemd was the culprit. I wasn’t the only one, who noticed it: https://github.com/systemd/systemd/issues/25179 Well, it did get fixed in https://github.com/systemd/systemd/pull/25184 but this didn’t help me now.

Since I only use sudo to manage my systems and with use the root account only with ssh’s public keys, I cannot login locally with root (as you can see from the message above). This also makes it impossible to update my system configuration with deploy.rs, because network isn’t established, yet, and I cannot remotely login. What a mess..

I fixed the issue with a dirty workaround:

I booted a live NixOS system
Imported my root pool with zpool import -af
mounted my nix store read-write on /mnt with mount -t zfs rpool/local/nix /mnt
found my fstab with find /mnt -iname "*fstab"
edited the fstab file with vim /mnt/store/hash-etc-fstab and manually changed the entry to use the new wwn path
umounted everything, exported the zfs pool with zfs export rpool and rebooted
Updated my system config and re-deployed the new config

PS: I tried the same with an Ubuntu live USB stick. The 20.04 version was too old for the ZFS pool to be imported, though ;-(

What I learned from that:

Don’t garbage collect until the next reboot
Use nofail option on mounts