Abstract

I am using Artix with dinit as the init system (systemd replacement). I was debugging an issue that I encountered before, was writing notes on it and decided to make it into a blog post so others may learn a bit about dinit and how I debug these types of issues. I wrote this blog post while debugging, so it is very stream-of-consciousness.

The issue

~❯ sudo dinitctl list | grep X
[     {X}] early-modules.target
[     {X}] modules
[     {X}] libvirtd
[     {X}] openvpn (exit status: 1)

Alright so something is failing during boot. Fun! I’m pretty sure that libvirtd, openvpn and early-modules.target are failing because modules is failing, so I will look to fix the modules failure first.

It is important to know that this is a system service (not a user service) evidenced by the fact that I’m running dinitctl as root here.

Also I just want to note how cool it is that | grep X works so beautifully. Unix tool composability at its finest.

Where is it?

First, we need to find the modules service file.

First I ask myself: Is there a way to ask dinitctl to give me the file location by inputting the service name? We check dinitctl --help, which is largely useless. But I know from previous experience that the dinit man pages are very nicely written, so I look at man dinitctl. Skimming through it, there is no such subcommand. Sucks. Grepping the whole filesystem for it is annoying because it has a generic name with no extension.

So okay, we have to find out where dinit loads the service files from. Through experience I know that man dinit-service exists, so I look at that. The only information regarding this is right at the top:

SYNOPSIS
       /etc/dinit.d/service-name, $XDG_CONFIG_HOME/dinit.d/service-name

But modules is not there. Okayy, lets try man dinitctl. Even searching for etc yields nothing so it’s a dead end. Well what now?

Lets try to grep the filesystem after all. The service file will probably be in a dinit.d folder so we can narrow it down:

# https://github.com/sharkdp/fd
# https://github.com/BurntSushi/ripgrep
$ cd /
$ fd modules | rg dinit
usr/lib/dinit/modules-load
usr/lib/dinit.d/early-modules.target
usr/lib/dinit.d/modules

Well okay, we found it, it’s /usr/lib/dinit.d/modules.

How is this not mentioned anywhere? I search the aforementioned man pages for usr and lib but nothing comes up. Finally, looking at the bottom of one of the previous man pages I see:

SEE ALSO
       dinit(8), dinit-service(5), shutdown(8).

Oh, let’s check man dinit.

And finally, grepping for usr finally gets us to this:

Dinit reads service descriptions from files located in a service description directory, normally one of /etc/dinit.d, /run/dinit.d,  /usr/local/lib/dinit.d and /lib/dinit.d for the system instance or   $XDG_CONFIG_HOME/dinit.d, $HOME/.config/dinit.d, /etc/dinit.d/user, /usr/lib/dinit.d/user and /usr/local/lib/dinit.d/user when run as a user process.

Eh? /usr/lib/dinit.d/user is mentioned for user services but /usr/lib/dinit.d is not mentioned for system services? How fun!

Error reporting

Alright so lets see what the service looks like:

type       = scripted
command    = /usr/lib/dinit/modules-load
restart    = false
depends-ms = kmod-static-nodes

Simple enough, looking at /usr/lib/dinit/modules-load, it is a bash script which takes the kernel modules out of standard paths and calls modprobe on them. There is no logging built into the script itself, and no logging specified in the dinit service file either. Yay.

While I was figuring out where the modules service was, I tried sudo dinitctl catlog modules to no avail, and checking /var/log as well. Just in case dinit does some logging by default.

Welp, alright, lets modify the service and tell it to log output. Through past experience and double-checking with man dinit-service I know I can do it like this:

type       = scripted
command    = /usr/lib/dinit/modules-load
restart    = false
depends-ms = kmod-static-nodes
# Added these two lines:
log-type   = buffer
log-buffer-size = 1073741824

Which is pretty intuitive! We don’t need to set log-buffer-size, but if there is a lot of output it can easily get snipped as the default size (which is not documented, hihi) is relatively small; so we give it one gigabyte.

Okay now lets try to just restart the service without a whole reboot. Funnily enough, if we just do

sudo dinitctl restart modules

we won’t get any logging, because we actually first need to run

sudo dinitctl reload modules

to tell dinit to reload the service file description. That was fun to debug the first time I ran into that issue :). Oh and also:

~❯ sudo dinitctl restart modules
dinitctl: cannot restart service; service not started.

Okay, a bit unergonomic but whatever, at least it’s clear what we’re actually supposed to do:

~> sudo dinitctl start modules
Service 'modules' failed to start.
Reason: service process terminated before ready: exited - status 123

Okay cool, let’s check the logs.

Diagnosing and fixing the actual issue

The logs say:

~❯ sudo dinitctl catlog modules
modprobe: WARNING: Module nvidia-uvm not found in directory /lib/modules/6.18.2-artix2-1

Ah, nvidia, my beloved. Seems that a driver is either not being installed at all, or being installed to a wrong folder? Lets check /lib/modules:

~❯ cd /lib/modules
~❯ ls
6.17.9-artix1-1  6.18.2-artix2-1

Okay I’m starting to get an idea of what the issue is:

~❯ fd nvidia-uvm
6.17.9-artix1-1/extramodules/nvidia-uvm.ko.zst
~> uname -r
6.18.2-artix2-1

Why though?

~> yay -Ss linux
system/linux 6.18.4.artix1-1 (143.0 MiB 143.6 MiB) (Installed: 6.18.2.artix2-1)
    The Linux kernel and modules

Okay correct version is specified as installed. Lets just try to reinstall the nvidia drivers. I have some notes that specify which packages I care about.

~> yay -Rns nvidia-open nvidia-utils nvidia-settings lib32-nvidia-utils
~> yay -Sy nvidia-open nvidia-utils nvidia-settings lib32-nvidia-utils
# ...
error: failed to commit transaction (conflicting files)
nvidia-utils: /usr/lib/elogind/system-sleep/nvidia exists in filesystem
Errors occurred, no packages were upgraded.
 -> error installing repo packages

Ah, yeah, I forgot about that artix quirk. No big deal.

$ sudo rm /usr/lib/elogind/system-sleep/nvidia
$ yay -Sy nvidia-open nvidia-utils nvidia-settings lib32-nvidia-utils
$ # Success
# Put the artix patch back:
$ z scripts nvidia
# https://github.com/ajeetdsouza/zoxide/
$ sudo cp ./elogind.nvidia /lib64/elogind/system-sleep/nvidia
$ sudo cp ./elogind.nvidia /usr/lib/elogind/system-sleep/nvidia

Are we good now?

~❯ cd /lib/modules
~❯ fd nvidia-uvm
6.18.4-artix1-1/extramodules/nvidia-uvm.ko.zst
# Looks good!
~> sudo dinitctl start modules
Service 'modules' failed to start.
Reason: service process terminated before ready: exited - status 123

Wait what?

~❯ sudo dinitctl catlog modules
modprobe: WARNING: Module nvidia-uvm not found in directory /lib/modules/6.18.2-artix2-1
(dinit: note: service restarted)
modprobe: WARNING: Module nvidia-uvm not found in directory /lib/modules/6.18.2-artix2-1
~> uname -r
6.18.2-artix2-1

Okay, first of all, I love the (dinit: note: service restarted) line. Awesome quality of line thing that some logging tools don’t do (like gdb and ./gdb_history). Second of all, why did the module get installed to a folder that is not the current linux installation?

~❯ ls
6.18.2-artix2-1  6.18.4-artix1-1

Whattt. Okay, let’s just try to update the whole system and reboot.

~> yay
# ...
error fetching clion: fatal: unable to access 'https://aur.archlinux.org/clion.git/': Recv failure: Connection reset by peer 
	context: exit status 1

error fetching postman-bin: fatal: unable to access 'https://aur.archlinux.org/postman-bin.git/': Recv failure: Connection reset by peer 
	context: exit status 1
~[1]>

Whatttt. Nothing out of the ordinary on the https://aur.archlinux.org/packages/clion and https://bbs.archlinux.org/viewforum.php?id=44 pages. Maybe just try again?

~> yay
# Successfully ends
~>

Alright just like magic, works every time.

~> yay -Ss linux
system/linux 6.18.4.artix1-1 (143.0 MiB 143.6 MiB) (Installed)
    The Linux kernel and modules
~> cd /lib/modules
~❯ ls
6.18.4-artix1-1

Alright we should be good to go now. I didn’t even realize that the previous yay -Ss linux output hinted at the fact that my kernel was outdated (though that still doesn’t explain why the nvidia modules were installed to an even older version).

~> loginctl reboot

Okay we’re booted up now! Annnd my top bar, noctalia-shell doesn’t start. Annd I can’t start my terminal kitty. Luckily I can still start rofi, and I have xfce-terminal still installed from my xfce install.

~> kitty
kitty: error while loading shared libraries: libpython3.13.so.1.0: cannot open shared object file: No such file or directory

But that error only really makes sense if I did a partial system install. What is going on?

~> yay -Syu
# Exits successfully.
~> ldd $(which kitty)
	linux-vdso.so.1 (0x00007fd4996c4000)
	libpython3.13.so.1.0 => not found
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fd499645000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007fd499400000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fd4996c6000)
~❯ ldd $(which python)
	linux-vdso.so.1 (0x00007fd495d26000)
	libpython3.14.so.1.0 => /usr/lib/libpython3.14.so.1.0 (0x00007fd495600000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007fd495200000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007fd4954f0000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fd495d28000)

Okay so my python version got upgraded but kitty is not pointing to the new version for some reason?

~> yay -Ss kitty
aur/kitty-git 1:0.40.0.r4.gcb464423e-2 (+28 0.03) (Installed: 1:0.44.0.r69.g11dd7eeb8-1)
    Modern, hackable, featureful, OpenGL based terminal emulator
extra/kitty 0.45.0-4 (17.1 MiB 60.9 MiB) 
    A modern, hackable, featureful, OpenGL-based terminal emulator
world/kitty 0.45.0-4 (17.1 MiB 61.0 MiB) 
    A modern, hackable, featureful, OpenGL-based terminal emulator
# I use kitty-git since I sometimes report bugs and kovidgoyal fixes them :D
~> yay -Sy kitty-git
# Finishes successfully
~> kitty
# Opens new kitty window.
# Okay cool...soo, I guess the same issue and fix with noctalia?
~> yay -Ss noctalia
aur/noctalia-shell-git 2.3.1.r16.g330eac0-2 (+2 1.31) (Installed: 3.8.2.r201.g6a61bf18-1)
    A sleek and minimal desktop shell thoughtfully crafted for Wayland, built with Quickshell. (git version)
aur/noctalia-shell 4.0.0-1 (+18 6.23) 
    A sleek and minimal desktop shell thoughtfully crafted for Wayland, built with Quickshell.
aur/noctalia 0.4.0-1 (+0 0.00) 
    A simple CLI for installing and updating Noctalia components
~> yay -Sy noctalia-shell-git
~> dinitctl start noctalia-shell
# Noctalia starts up successfully.
~> 

Okayy coool cool. Oh yeah lets check if modules are fixed!

~> sudo dinitctl list | rg modules
[{+}     ] early-modules.target
[{+}     ] modules

Yayy, it’s working now!

Retrospective

Alright so all I needed was a system update. The reason I didn’t suspect this is that I have updated my system quite recently, but I did actually get installation errors about the lib32-nvidia-utils package so I had to skip it during the upgrade. With this in mind, I could easily steer myself now towards trying another system update as the potential fix. I don’t know what the previous installation errors were about, last I checked that package was flagged out of date, but I didn’t see anyone else actively having issues. I have heard of people having issues with the python3.14 upgrade on arch though.

By the way, I have already experienced, debugged, and fixed, this same modules failure before. It was also the same root cause (mispackaged nvidia drivers). I didn’t know it would be the same root cause when I started writing this blog post. I had completely forgotten on the steps to debug this though, so I had to rediscover them just know. It just goes to show how counterintuitive some of this stuff is. Let’s hope I don’t have to debug this thing a third time :)

Wishlist

Here is stuff that I wish was different, which would have made for a much smoother user experience. Maybe some of this stuff has rationale for why it is the way it is, I don’t know, just throwing some ideas out there.

dinit:

  • man dinit-service should not have a deceptive SYNOPSIS.
  • man dinit should specify that it looks into the /usr/lib/dinit/ path.
  • Give me a sudo dinitctl locate modules that gives me the path of the file please.
  • Give me a switch to enable buffer logging for all services, or come by default with enabled buffer logging.
  • log-type = buffer should use a ring-buffer instead of truncating after reaching the size limit.
  • The default value of log-buffer-size should be documented.
  • dinitctl restart service should also start services that are down
  • I want a dinitctl command that does reload + (start or restart, whichever is appropriate)

artix:

  • Enable logging on these service files by default. Especially if they don’t make much output like this one.
  • Fix the nvidia laptop can’t go to sleep thing please, the fix is known, just do it.

arch:

  • How did the package even get installed to the wrong folder in the first place?
  • Why did I have python3.14 issues? Why did I manually have to reinstall the -git packages? Am I wrong to expect them to update on yay -Syu?

Conclusion

Sometimes, the road to figuring out the solution can be treacherous even if the solution is simple. Although this blog post might have been fast to read through, doing the debugging itself takes some time as I need to pause and think about what the next step is. Thus, I wish we as a community focused more on buliding intuitive tools, lest we die by death of a thousand cuts.

Thanks for following along! If you have any tips on a better debugging workflow, let me know!