QNX From The Board Up #16 - Let's write some utilities...

Take a quick break from memory to implement some common utilities like ls, shutdown, and uname on your own custom image. Simple, right? Right??

QNX From The Board Up #16 - Let's write some utilities...

Welcome to the blog series "From The Board Up" by Michael Brown. In this series we create a QNX image from scratch and build upon it stepwise, all the while looking deep under the hood to understand what the system is doing at each step.

With QNX for a decade now, Michael works on the QNX kernel and has in-depth knowledge and experience with embedded systems and system architecture.


According to the time at which the sun set each night over the last week, it seems we in the northern hemisphere are closer to the end of summer than the beginning, but I am extremely not ready yet to declare "da summa is aussi" (summer is over). So, rather than going "obi in's Tal" (down into the valley) I thought it might be good to think about Ken Thompson's UNIX Summer of Love in 1969 and create a few more utils for our simple QNX configuration.

So far, we've written our nano-KISS program (nkissh), which takes input, parses it up, and runs programs. We might have had the audacity to call it a shell. We even added support for searching the PATH environment variable by moving from a fork()-and-exec() approach to using posix_spawn().

I did mention that I have a variation on nkissh called "kilo-KISS shell" (kkissh) which prints a prompt with the current working directory. Well, all that really does is tell us what the default current working directory for a program created by the QNX init script is, because we haven't provided a way to change it.

The default current working directory that QNX gives us is /. No surprise there. A nice reliant automobile directory because there'll always be a root directory.

But, it might be nice to look around and run ls without having to specify a path, so, let's add the ability to change the current working directory (cwd).

cd

Now, we could write a program, but, the current working directory is a property of a process. Having an external program change its own current working directory would have no effect upon our nkissh program. So, nkissh itself will have to recognize the command "change (current working) directory", cd, and execute the functionality itself.

And this leads to something common with most shells: a split between functionality that is provided by external programs, and functionality "built in" to the shell program itself, aka "built-ins", aka "builtins". cd is a builtin because it's affecting the shell process.

NOTE: Changing the cwd also affects subsequent processes a process creates. See the notes for posix_spawn() which say, "The current working directory of the child process shall be the same as it is in the parent process."

POSIX describes quite a few things you can do with cd and other shells do even more. But, for our program, we're going to keep it very simple:

  • cd will change the current working directory by calling chdir().
  • The prompt will print the current working directory, querying it by calling getcwd().

How you choose to do this is up to you. The code for the prompt could look something like this:

static void print_cwd(void)
{
    char path[PATH_MAX];
    char const * const cwd = getcwd(path, sizeof(path));
    if (NULL != cwd) {
        printf("%s", cwd);
    }
}

static void print_prompt(void)
{
    print_cwd();
    printf(" # ");
    fflush(stdout);
}

Or not. It's your shell.

Why not use write() and STDOUT_FILENO? Because printf() is handy and I'm not worried about every femtosecond of performance here. (Obviously. Because this info could be cached and updated.) But, now we're using streams and full buffering for standard out, hence the fflush(). Keepin' it seriously simple, 'cause "I just wanna see it work", which is the bane of proper software engineering, and definitely not part of a safety culture. But, we're experimenting to learn. Nothing here is going out the door.

Aside: BTW, it is, IMHO, a fun little exercise to write your own printf(). It is both easier than you initially expect, and harder than you initially expect. You'll need this, and maybe keep this in mind. Start with supporting just %s and %x and go from there. Don't forget to take a swing by %n and "alternative formats", just for a lark.

uname

It would be nice to know a little something about the QNX system we're running on, and that's where the uname utility comes in.

This one's pretty easy because all the information is available via the function uname(). POSIX defines it, and QNX has it. Add some printf()s and Bob's your uncle!

Note too that the information provided by uname() is available via "configuration-defined string values" aka "conf strings", which are accessible via the function confstr(). (Also a POSIX thing.)

nproc

It's not a POSIX thing, it's a GNU thing. There are some things that complicate this a bit, but for now, let's keep it simple. Who knows how many CPUs lurk in the heart of the system? The Shadow System Page knows!

int main(void)
{
    printf("%" PRIu16 "\n", _syspage_ptr->num_cpu);
    return EXIT_SUCCESS;
}

Boom. Done.

sleep

This one's easy. Take the time from the command line (argv[]) and maybe use atoi() or strtoull() to convert from a string to an integer of some kind. Pass to sleep(). Done.

shutdown

Before we talk about a utility to shut down the system, let's step back a second and look at how, in general, the system can be shut down.

In all cases, you end up at the reboot() kernel callout in the System Page. It knows the system-specific way to tell the hardware to put the system into a known good state.

What's more interesting are the ways we can end up at the reboot() callout:

  • Please reboot the system.
  • Uh oh.

Please Reboot The System

This one is easy. Someone (with the right abilities) calls a function in the C library, sysmgr_reboot().

sysmgr_reboot() sends a message to the System Manager (aka sys mgr, pronounced "sys mugger") (a server in the QNX kernel process) and in response to this message, the QNX kernel will invoke the reboot() kernel callout.

More specifically, the reboot() kernel callout is called with the argument abnormal set to 0. i.e. "This reboot is happening for the normal reason: You explicitly asked for it to happen by calling sysmgr_reboot()."

This naturally leads one to ask: what are the "abnormal" reasons? That's where the "Uh oh." comes in.

Uh Oh

This is a bigger topic but, we can boil this down to two major scenarios:

  • Something (or some things) bad happened in the kernel or the hardware, and (one of them) was detected by the QNX kernel.
  • Something (or some things) bad happened in a critical process, and (one of them) was detected by the QNX kernel or the critical process.

Design Safe State

It is possible for the QNX kernel to encounter situations where it has detected that "something ain't right" and the only safe path forward is move to a known good state. What is that known good state? The state of the system after a reboot: hardware has been forced to return to a good state, and software has returned to a good state. i.e. QNX is transitioning the system to its Design Safe State (DSS). i.e. By design, the system is returning / transitioning to a safe state.

Critical Program

The other scenario is that a "critical process" terminated, or was terminated. What qualifies a program as a "critical" process? That's up to you. You can designate any process to be a "critical process".

"Even a 'Hello, world!' program?"

Yup. It's your system. Do what you need to do.

If you decide that the functionality a process provides is critically important to the correct operation of your system, and you mark it as critical, then, if that process terminates, either normally (e.g. exit()) or abnormally (e.g. SIGSEGV), then the whole system will be put into a known good state. i.e. the reboot() callout will be invoked.

See the SPAWN_CRITICAL flag for spawn(), or the POSIX_SPAWN_CRITICAL flag for posix_spawn().

Shutdown Report

In both of the above "uh oh" cases, QNX will first send a little message, a "shutdown message", or "shutdown report", to the debug kernel callout, display_char(). In our configuration, that means it'll go to the 8250 UART. This will help start the search for the source of the detected issue.

After the report is finished, the QNX kernel will invoke the reboot() kernel callout with a non-zero value for abnormal; at the moment it's 1, which is definitely non-zero.

reboot() Callout

If you're developing and debugging your system, you can modify your version of the reboot() callout to do something special if you see an abnormal reboot. (e.g. See the startup generic option -A.)

But, in a deployed system, no matter the value of abnormal, do whatever is appropriate for your system to get the system back to a safe state. Asserting the RESET line -- or the equivalent therefore -- is the usual solution to get the hardware back to a known good state.

But What About The Others?

Note that for all of the above scenarios, calling sysmgr_reboot(), or "uh oh", it's a very short path to the reboot() callout. Running software is not given a chance to clean things up. "Oh, were you in the middle of something? Tough. LightsarebeingturnedoffNOW!"

But what if you're shutting down because you're done, and want to gently reboot the system? How to do that?

Well, it's your system. You're the one who knows best how to unwind what you wound up when you started the system.

In the case of our minimal demo system, no prep work is necessary. Our shell is the only process.

But, once you have things like file systems, USB devices, -- i.e. state in flux, caching of modified persistent data, data in flight, ... -- getting the system to a good consistent state before pulling the plug is A Good Thing.

In this scenario where you want to gracefully shut down the system because you're done:

  • notify and tear down some or all of your processes in the way you deem best, then
  • when everything is ready, call sysmgr_reboot().

Yes, there is a shutdown utility provided for convenience, and it is pretty good at being smart about terminating clients and then terminating servers, but, it's your system. You know best.

Weren't We Writing a shutdown Utility?

Oh, yeah. Um, this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/sysmgr.h>

int main(void)
{
    sysmgr_reboot();

    // If we're here, something failed.
    fprintf(stderr, "Unable to reboot. Error %d (%s)\n", errno, strerror(errno));

    return EXIT_FAILURE;
}

KISS: Yoink!

We might want to revisit this later when the system configuration is more complex than "a single shell process".

id

Also not POSIX but rather a GNU thing. And, we're just gonna use the idea of reporting our id. Way too many options on that beast for our first attempt.

Let's try something simple:

int main(void)
{
    printf("uid: %d\n", getuid());
    printf("gid: %d\n", getgid());

    return EXIT_SUCCESS;
}

Run it:

 / # id
uid: 0
gid: 0

Interesting. So, by default, QNX assigned us user id 0, and group id 0. What's that mean?

Well, according to this it means "The QNX OS recognizes user ID 0 as being privileged, and traditionally an account with uid 0 is called root."

Let's see if our name is root by getting our user name using getpwuid():

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <pwd.h>

static
const char *
uid_to_string(const uid_t uid)
{
    struct passwd const * const pw = getpwuid(uid);
    if (NULL != pw) {
        return pw->pw_name;
    }
    return "(unknown)";
}

int main(void)
{
    const uid_t uid = getuid();
    char const * const user_name = uid_to_string(uid);
    printf("uid: %d - %s\n", uid, user_name);

    printf("gid: %d\n", getgid());

    return EXIT_SUCCESS;
}

Now let's see what happens:

/ # id
uid: 0 - (unknown)
gid: 0

Hm. If we do a bit of debugging:

static
const char *
uid_to_string(const uid_t uid)
{
    struct passwd const * const pw = getpwuid(uid);
    if (NULL != pw) {
        return pw->pw_name;
    }
    printf("Error %d (%s)\n", errno, strerror(errno));
    return "(unknown)";
}

we get this:

Error 2 (No such file or directory)

ENOENT? No such file? Where exactly does getpwuid() get its information?

The answer isn't blindingly obvious from the docs for getpwuid(), but, it's the password file, /etc/passwd. And we do not have that on our system. Hence the NULL returned by getpwuid().

But, we can create one easily by adding this to our image build file, after the init script:

# username:has_pw:userid:group:comment:homedir:shell
/etc/passwd = {
root::0:0::/:/bin/sh
}

This tells mkifs to:

  • create a text file in the Image File System (IFS) with the contents within the braces, and
  • to tell QNX that it should be presented to the system at /etc/passwd.

Aside: I'm not going to get into the details and worry about a proper, secure configuration for now. Proper authentication configuration (e.g. PAM) is a much bigger story. e.g. I'm leaving the has_pw field empty despite the documentation saying "It's not recommended to leave it empty, ..." This is a hack to get getpwuid() to (barely) work.

If we build the image and then look at the files in the IFS using dumpifs:

x86_64 $ dumpifs ./image.ifs
  Offset     Size    Entry Name
...
  835348       5c     ---- Image-header mountpoint=/
...
    ----     ----     ---- etc
  8356fc       15     ---- etc/passwd
...

It's saying that there needs to be a directory named /etc and a file /etc/passwd which has a size of 0x15 (21) bytes.

Can we see that in our system when we run it?

/ # ls
proc etc usr dev
/ # ls /etc
passwd
/ # cat /etc/passwd
root::0:0::/:/bin/sh
/ #

Isn't that fancy! (Aside: /bin/sh? Yes, there's more to this. Later articles.)

Now what happens when we run our id program?

/ # id
uid: 0 - root
gid: 0

Can we give our group a name? Typically that's root too for group 0.

That's where /etc/group and getgrgid() come in:

# groupname:x:group_ID:[username[,username]...]
/etc/group = {
root:x:0:root
}

and then this to our code:

#include <grp.h>

...

static
const char *
gid_to_string(const gid_t gid)
{
    struct group const * const g = getgrgid(gid);
    if (NULL != g) {
        return g->gr_name;
    }
    return "(unknown)";
}

int main(void)
{
    const uid_t uid = getuid();
    char const * const user_name = uid_to_string(uid);
    printf("uid: %d - %s\n", uid, user_name);

    const gid_t gid = getgid();
    char const * const group_name = gid_to_string(gid);
    printf("gid: %d - %s\n", gid, group_name);

    return EXIT_SUCCESS;
}

and we get:

/ # ls /etc
group passwd
/ # cat /etc/group
root:x:0:root
/ # id
uid: 0 - root
gid: 0 - root
/ #

Aside: Again, I'm not going to get into the contents and worry about proper configuration for now. Proper configuration for secure deployment is a much bigger story.

ls / stat()

We saw from the dumpifs that /etc/passwd was 0x15 (21) bytes. How can we see that size with ls? Well, we can add to our ls implementation a call to stat() and format that information nicely.

With a bit more effort, and using what we learned from id above, the output from ls can look like this:

/ # cd etc
/etc # ls -la
-rw-r--r--   root   root      14 Sep  9 2025 22:56:13  group
-rw-r--r--   root   root      21 Sep  9 2025 22:56:13  passwd
/etc #

Note that because we didn't give more specific information, group and passwd were marked as writeable by the 'owner' by mkifs. For the sake of seeing some more things you can do with build files and mkifs, let's be more accurate / specific about file properties.

We can set file permissions using the perms attribute. It supports chmod-like values, so we can use (octal) 444, i.e. user, group, and other get read-only. And while we're at it, we can be explicit about the user id and group id too:

# username:has_pw:userid:group:comment:homedir:shell
[uid=0 gid=0 perms=444] /etc/passwd = {
root::0:0::/:/bin/sh
}

# groupname:x:group_ID:[username[,username]...]
[uid=0 gid=0 perms=444] /etc/group = {
root:x:0:root
}

Aside: Note that we're not actually improving security by changing the file perms for 'owner' to be read-only because the resource manager for the IFS will reject all attempts to modify its contents. Try it out! Oh, if you do, keep in mind that error number 30 is EROFS, "Read-only file system".

How about now?

/ # cd /etc
/etc # ls -al
-r--r--r--   root   root      14 Sep  9 2025 22:58:48  group
-r--r--r--   root   root      21 Sep  9 2025 22:58:48  passwd
/etc #

That's starting to look like the ls output I usually see.

Recap

Now we have

  • a "shell" with a "nice" prompt and support for cd,
  • ls with a little extra,
  • cat,
  • uname, nproc, id, shutdown, and sleep.

Not exactly "Programmer's Workbench", but, you have to start somewhere, and the above didn't really take very long.

Coming up...

In future posts we'll get back to our deep dive on memory, talking more about memory mapping and typed memory.