QNX From The Board Up #4 - Explore the Filesystem

Create your own "ls", explore the file system, and learn about the kernel's other resource managers.

QNX From The Board Up #4 - Explore the Filesystem

Welcome to the blog series "From The Board Up" by Michael Brown. In this series we create a QNX image from scratch and build upon it stepwise, all the while looking deep under the hood to understand what the system is doing at each step.

With QNX for a decade now, Michael works on the QNX kernel and has in-depth knowledge and experience with embedded systems and system architecture.


Welcome to part 4 of the series. Up until now, we were able to:

  • configure QEMU to emulate a 64-bit Intel system that
  • has firmware that implements the Multiboot specification to
  • load a Multiboot image that
  • has the files in the image file system (IFS) necessary to
  • run a program that prints "Hello, world!"

And we learned a few things about ELFs, the Image File System (IFS), and implementing device drivers with resource managers.

Now that we can create something and run it on QNX, we can take a look around!

We're Going to Need ls

First thing we might want to do is see which files exist on this minimally-configured QNX system. While we could run the ls utility provided with the QNX SDP, in the spirit of Ken Thompson's Summer of UNIX Love, let's see how hard it is to write our own.

💡
Ken Thompson didn't create ls out of thin air. It comes from his earlier experience with Multics, which has an ls command. (With support for a -a argument!) That's likely predated by the listf (i.e. list of all file names) console command of CTSS.

Whereas Ken would have called open() on a directory and parsed the contents, we have the benefit of opendir() and readdir() to help with that.

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <dirent.h>

int main(int argc, char* argv[])
{
    // If no path specified, then use current working directory
    char const * const path = (argc > 1) ? argv[1] : ".";

    DIR* const dir = opendir(path);
    if (NULL == dir) return EXIT_FAILURE;

    while (1) {
        struct dirent const * const ent = readdir(dir);
        if (NULL == ent) break;

        puts(ent->d_name);
    }

    return EXIT_SUCCESS;
}

It has minimal error checking, but it works. (No, this isn't as full-featured as a modern implementation of ls, but you gotta start somewhere.)

Note! From here on, I'll skip mentioning "modify the build file, rebuild the image, and run QEMU" as we make changes to what our image is running. See the previous post for a reminder of how to do this.

Running Our ls

If we run that ls executable on this minimal QNX system by removing helloworld and adding ls to the build file and its initialization script (like so):

    /proc/boot/ls

... then we'll get the following output:

proc
usr
dev

We know about /proc/boot from last time when we were looking at the IFS, so it looks like we can see /proc. Oh, that means the "current working directory" of ls – which is the first (and only) user process created by QNX in this example – is the root directory, /. Kinda makes sense.

If we modify the initialization script a bit to look into /proc/boot:

    /proc/boot/ls /proc/boot

Passing /proc/boot as an argument to our ls in the initialization script.

... then we'll get:

procnto-smp-instr
init
ldqnx-64.so
ldqnx-64.so.2
libc.so
libc.so.6
libgcc_s.so
libgcc_s.so.1
ls

and we'll start to realize a couple things:

  1. This cycle of "modify the build file, build the image, then reboot QEMU", is slow and getting old, and therefore
  2. it would be nice to interact with the system!

Let's Add Interactivity

To help us look around more easily, let's create a little program that lets us run ls with whatever paths/arguments we want by:

  • reading a command
  • parsing up the command into "command" and "arguments", then
  • running the command by
    • fork()ing
      • i.e. create a new (child) process, and
    • having that child process exec()ute the command
      • i.e. execute the file specified by the command, passing it the arguments from the command.

In simple terms:

    while(1) {
        read a command  // e.g. /proc/boot/ls /proc
        break it up into command and argument(s)
        fork(), and exec()ute the command
    }

Our plan for interactivity.

Skipping the headers, we can have something like this:

static const char * const DELIMITERS = " ";

int main(void)
{
    while (1) {
        // Read until we see either see a carriage return, or
        // we get to the last character of the command_line buffer.
        // (We have to leave room for the terminating '\0')
        // If we get to the end, just don't echo anymore
        // until they press return.

        char     command_line[1024];
        unsigned index = 0;
        while (1) {
            char c;
            int num_read = read(STDIN_FILENO, &c, sizeof(c));
            if (1 == num_read) {
                if ('\r' == c) {
                    break;
                }
                // If we don't have room for terminating '\0',
                // ignore until we see a return
                if (index < (sizeof(command_line) - 1)) {
                    command_line[index] = c;
                    index++;
                }
            }
        }
        command_line[index] = '\0';

        // Parse the string up into an array of tokens
        char const * tokens [512];   // Array of pointers to tokens within 'command_line'
        unsigned     num_tokens = 0; // How many tokens were seen in the last command?
        char const * p = strtok(command_line, DELIMITERS);
        while (NULL != p) {
            tokens[num_tokens] = p;
            num_tokens++;
            p = strtok(NULL, DELIMITERS);
        }

        // Old school fork-and-exec. KISS.
        const pid_t pid = fork();
        if (-1 == pid) {
            return EXIT_FAILURE;
        } else if (0 == pid) {
            tokens[num_tokens] = NULL; // Don't forget terminating NULL entry
            execv(tokens[0], (char**)tokens);
            return EXIT_FAILURE; // If we get here, something was wrong.
        }
        wait(NULL);  // Wait for the child process to finish
    }

    return EXIT_SUCCESS;
}

We'll call this little program nkiss (nano keep it super simple (KISS))

Modify the build file to have this in the initialization script instead of running ls directly:

    /proc/boot/nkiss

Modified line in the initialization script, to take the place of our ls.

...and we have:

... nothing. But, we know ('cause we wrote the code) that nkiss is sitting there waiting for input, so if we type in /proc/boot/ls /proc/boot and hit Enter, we'll see:

procnto-smp-instr
init
ldqnx-64.so
ldqnx-64.so.2
libc.so
libc.so.6
libgcc_s.so
libgcc_s.so.1
ls
nkiss

And voila, we have (super basic) interactivity. But the lack of feedback when typing is unnerving: some feedback while we're typing would be nice.

ECHO ECho echo ... Our Input!

To fix that, we'll have nkiss print what we're typing (aka "echo it back to us"):

        while (1) {
            char c;
            int num_read = read(STDIN_FILENO, &c, sizeof(c));
            if (1 == num_read) {
                if ('\r' == c) {
                    write(STDOUT_FILENO, &NEWLINE, sizeof(NEWLINE));
                    break;
                }
                // If we're beyond the end, don't echo and just throw away
                if (index < (sizeof(command_line) - 1)) {
                    write(STDOUT_FILENO, &c, sizeof(c)); // echo back
                    command_line[index] = c;
                    index++;
                }
            }
        }

Added code for nkiss to echo our input.

... where NEWLINE is defined to be:

static const char  NEWLINE    = '\n';

Run it, and we see output as we type each character:

/proc/boot/ls /proc/boot

Productivity is soaring now!

Introducing a PATH Environment Variable

I don't wanna seem ungrateful for this nkiss program, but I wish to register a complaint: adding /proc/boot/ to the beginning of every command I type is already getting annoying. Since all the thing(s) we want to run (okay, just ls for now) is in /proc/boot, maybe we can streamline a bit?

We can, starting by using posix_spawnp() instead of Old Skool fork() and exec(). posix_spawnp() has a very nice feature: it'll actually search for your program! Where does it search? Wherever the PATH environment variable tells it to search.

We have a few ways we can set the PATH environment variable. The two easiest ways are:

  1. have nkiss call setenv() to set it, or
  2. tell QNX to set PATH when it runs nkiss.

Calling setenv() seems brittle, because what if someone decides to change the build file to mount the IFS at /tickletrunk instead of /proc/boot? We saw last time how to do this, and I know you tried it..! If this happened, we'd have to change and recompile nkiss too. That doesn't scale well, so, let's go with Door #2, Monty.

Environment Variables And The Initialization Script

There are 2 ways we can set an environment variable in QNX for a program launched by the initialization script. This:

    PATH=/proc/boot
    /proc/boot/nkiss

... or this:

    PATH=/proc/boot /proc/boot/nkiss

There's a subtle difference between them if we start adding more commands to the init script:

  • the first one causes the PATH environment variable to be set for all following commands in the init script;
  • the second one causes the PATH environment variable to be set only for the one nkiss command in the init script; it is not applied to subsequent commands in the init script.
💡
If you run dumpifs with 2 vs, you can see the environment variables applied to the commands in the init script. For example:
x86_64 $ dumpifs -vv ./image.ifs

posix_spawnp()

If you're following along at home, we'll make the changes for posix_spawnp() so we get away from fork() and exec(). The only funky thing about this function is that it wants us to provide the environment variables for the (child) process. The easy (and correct) answer to the question "Give what environment variable(s) to the child process?" is "Give it a copy of mine." And, POSIX says that you can get your own process's environment variables via the conveniently-provided variable environ.

        // Execute the command, aka spawn the executable
        extern char ** environ;    // Array of pointers to enviro vars provided by C lib
        int rv = posix_spawnp( NULL, tokens[0], NULL, NULL, (char**)tokens, environ);
        if (0 != rv) continue;     // No feedback. Sorry.
        wait(NULL);  // Wait for the child process to finish

Let's change the build file so that nkiss gets a useful PATH environment variable:

    PATH=/proc/boot /proc/boot/nkiss

The final image build file change.

... and then let's run ls without using the absolute path!

ls
proc
usr
dev

Heaps of joy in Mudville!

Files On Our Minimal QNX System

Now that we have a nkiss that lets us run our ls with different arguments, we can query around and stitch together the following picture of the directories and files on this minimal QNX system:

/
├── proc
│   ├── self
│   │   ├── as
│   │   ├── cmdline
│   │   ├── exefile
│   │   ├── vmstat
│   │   ├── pmap
│   │   ├── mappings
│   │   └── ctl
│   ├── boot
│   │   ├── procnto-smp-instr
│   │   ├── init
│   │   ├── ldqnx-64.so
│   │   ├── ldqnx-64.so.2
│   │   ├── libc.so
│   │   ├── libc.so.6
│   │   ├── libgcc_s.so
│   │   ├── libgcc_s.so.1
│   │   └── ls
│   ├── config
│   ├── vm
│   │   └── stats
│   └── ker
│       ├── intr
│       └── stats
├── usr
│   └── lib
│       └── ldqnx-64.so.2
└── dev
    ├── shm
    │   └──
    ├── shmem
    │   └──
    ├── zero
    ├── stderr
    ├── stdout
    ├── stdin
    ├── tty
    ├── text
    └── null

A tree view of (most of) the entries in our minimal image's filesystem.

⚠️
I've excluded from this all the /proc/<number> entries. I'll explain in a second.

I said earlier that the kernel runs a device driver / resource manager to expose the IFS at /proc/boot. Given the above, it looks like there are a few other device drivers / resource managers being provided by the QNX kernel.

Note that /proc/boot on a true minimal system (i.e. before we added the things necessary to run helloworld( or ls (or nkiss))) would look like this:

/
├── proc
│   ├── boot
│   │   ├── procnto-smp-instr
│   │   ├── init

Resource Manager for /proc/<pid>

/proc/self is a special case of /proc/<number>, where <number> is the identifier for a process (aka process id, aka pid). /proc/self, when accessed by a process, is just an alias to its own pid. e.g. If I'm a process with a pid of 42, I can access the files under /proc/42, or proc/self: they're the same thing.

The seven files under each /proc/<pid> directory provide information about a process for debugging purposes.

For example:

These are not files for accessing in your deployed system; these are files for laying down and avoiding. (See also -d and -u.)

Resource Manager for IFS

As we saw before, the IFS in the image file (aka the "primary IFS") is – as an mkifs default – mounted at /proc/boot, so our files as described in the build file are there.

  • The startup file is not there, but that's because it doesn't need to be accessed by QNX, therefore it's not in the IFS.
  • /proc/boot/init is the file with the initialization script. This is the "compiled" version of what's in the build file, with a QNX-specific format.
  • procnto-smp-instr is there because startup actually needs it in memory to give control to the kernel. The primary IFS is loaded into memory by startup, so, how convenient!

Aside: What's in a name?

Warning: Bit of etymology here. Feel free to skip.

That which we call procnto-smp-instr is the binary with the QNX kernel. So why not call it the_qnx_kernel or qnx_kernel? Well, history, and vestigial options.

One upon a time, the QNX kernel binary was procnto because it was

  • newer than the even older version of the QNX kernel which had the name Proc, and
  • had 2 major components:
    • Process Manager (aka procmgr, aka proc)
    • Neutrino (aka nto).

The Process Manager actually refers to the process the QNX kernel creates for itself to help itself do work. Older versions of QNX had several "system processes", and one, "Proc", was responsible for managing process lifetime (and kernel calls and interrupts and...). This was repurposed in QNX 6 to become The Kernel Process, which does process management and many other things. The older name proc kinda stuck because a lot of the stuff in Proc is done in The Kernel Process.

Neutrino refers to a "new" (to QNX 6 (well, it's a little more complicated than that)) implementation of the QNX microkernel. Neutrino, aka nto.

So, procnto came to be.

But, what if you wanted an instrumented version of the kernel? i.e. one which you could profile and monitor using TraceEvent()-based functionality? That requires a different version / variant of the kernel compiled slightly differently, hence procnto-instr also exists.

But, what if you wanted a version that supports multiple CPUs? procnto-smp ("smp").

But, what if you wanted a version that supports tracing functionality and multiple CPUs? procnto-smp-instr.

During SDP 7, we started simplifying by saying:

  • single-CPU systems aren't really a thing anymore, and
  • the performance impact of TraceEvent() functionality being present and unused is insignificant.

Let's keep it simple and use just one, procnto-smp-instr. And some then asked "Can't we just change the name back to a very simple procnto then?" The problem with that is that it [cw]ould break things and [cw]ould be confusing. Therefore, let's just leave it at procnto-smp-instr.

Resource Managers Under /dev

These are talked about in the documentation but, I just want to discuss /dev/text in particular because it's a wrapper around the startup-provided debug interface. In our configuration, that means it's using the 8250 UART.

Because we haven't configured our QNX system to use anything else, this is what is being given by default to processes running on QNX for use as standard input, standard output, and standard error. (i.e. when we called printf("Hello, world!\n") which prints to standard out, it was actually writing to /dev/text, which caused the characters to be sent to the 8250 UART.)

/dev/text

I've kinda glossed over this, but, let's be a little clearer now about what our programs have been doing:

  • helloworld uses printf to send a string of characters to the FILE stdout
  • ls also uses printf to write to the FILE stdout
  • nkiss uses the file descriptors STDIN_FILENO and STDOUT_FILENO to read and write from .. something.

The C specification says in the section titled "Input/output" that a FILE is "an object type capable of recording all the information to control a stream" and that "stderr, stdin and stdout which are expressions of type 'pointer to FILE' that point to the FILE objects associated, respectively, with the standard error, input, and output streams."

What's a stream? A little later in the C spec it says:

"Input and output, whether to or from physical devices such as terminals and tape drives, or whether to or from files supported on structured storage devices, are mapped into logical data streams, whose properties are more uniform than their various inputs and outputs. Two forms of mapping are supported, for text streams and for binary streams."

So, a FILE is a wrapper around devices that gives you a way of reading and writing bytes of data. Gotcha.

The C spec also says:

"At program start-up, three streams shall be predefined and already open: stdin (standard input, for conventional input) for reading, stdout (standard output, for conventional output) for writing, and stderr (standard error, for diagnostic output) for writing."

That's sounding like 2 FILEs that use our UART, and stderr which uses something.

What's with STDIN_FILENO and STDOUT_FILENO? Those are POSIX things, where the POSIX spec says:

"The following symbolic values in <unistd.h> define the file descriptors that shall be associated with the C-language stdin, stdout, and stderr when the application is started:
STDIN_FILENO - Standard input value, stdin. Its value is 0.
STDOUT_FILENO - Standard output value, stdout. Its value is 1.
STDERR_FILENO - Standard error value, stderr. Its value is 2."

A "file descriptor"? POSIX says a file descriptor is:

"A per-process unique, non-negative integer used to identify an open file for the purpose of file access. The values 0, 1, and 2 have special meaning and conventional uses, and are referred to as standard input, standard output, and standard error, respectively."

Notice the use of "an open file".

To summarize:

  • STDOUT_FILENO is just a preprocessor macro for 1, a number / file descriptor that identifies an open file that can be used to access standard output, and
  • stdout is a FILE for a "predefined and already open" stream "standard output"
  • and similarly for standard input and standard error

i.e. there are different ways of talking about the same thing: an open file / stream / file descriptor. (No, they're not EXACTLY the same thing, but they -- in essence -- refer to the same thing.)

More specifically, stdout is actually a C thing, and it just a wrapper around the POSIX file descriptor 1. Obviously, this won't be true on, say, Windows which has HANDLE s to files. But, that's the whole point of a C FILE: you don't want to know what's going on down below because you want portability, i.e. "Not my problem."

In our simple system:

  • there's a device driver / resource manager in the kernel that exposes the UART for debugging as /dev/text, and
  • the QNX kernel has opened the file /dev/text for standard input, output, and error, and
  • passed those file descriptors to the programs created in the initialization script.

Then, when nkiss creates and launches ls inside a new process using either fork() and exec(), or posix_spawnp(), the new process inherits the file descriptors given to nkiss.

Why does a new process inherit file descriptors from its parents? Because POSIX says so, that's why. It's in the book! And POSIX says so because UNIX did/does so. (There are nuances and oodles of options, but, that's what's happening here).

The Microkernel Big Picture

Moving away from device drivers / resource managers, looking at the bigger picture a bit, let's talk about options for the architecture of an operating system's kernel.

Let's review about a few ideas from the World of Safety, specifically ISO 26262:

  • harm : physical injury or damage to the health of persons
  • hazard : a potential source of harm
  • risk : the combination of the probability of occurrence of harm, and the severity of that harm
  • unreasonable risk: risk judged to be unacceptable in a certain context according to valid societal moral concepts
  • safety : absence of unreasonable risk

When it comes to OS kernel architectures, one huge consideration is trust, and trust is about risk management.

If you trust a few people, you'll use a monolithic architecture.

If you don't trust anyone else, you'll use a microkernel architecture. The QNX kernel has a microkernel architecture. And, we pay a price for this lack of trust: There is no free lunch. But, with a lot of effort, it's possible to keep the cost for this lack of trust very low. It's something we've worked on at QNX for a long time.

When you're targeting safety and security standards, the concept of 'isolation', aka 'freedom from interference' (FFI), aka don't trust anyone, is very important.

A microkernel architecture keeps user-provided device drivers outside the kernel. That means device drivers go inside processes, and that means a microkernel needs good inter-process communication (IPC). But, it doesn't mean there can't be a few QNX-provided (trusted) resource managers inside a process.

More on this big picture stuff later.

Recap: What Did We Learn?

We just explained that:

  • there's a device driver / resource manager created by the QNX kernel during initialization that exposes the 8250 UART (used for debugging) as the file /dev/text;
  • the QNX kernel opens that file and passes it as 3 file descriptors (for standard input, standard output, and standard error) to processes created by the initialization script;
  • when processes created by the initialization script create processes, the child processes inherit those file descriptors.
  • (typically, but, you can live your life however you want).
  • our programs (helloworld, ls, nkiss) have been using those file descriptors
    • and in some cases, the C FILE wrappers around them.

Long story short; our programs have been interacting with the kernel's resource manager behind /dev/text. AND, we pointed out that there are a few QNX-provided device drivers / resource managers in the QNX kernel's (trusted) process, including the one for /dev/text.

We explained that QNX's kernel is a microkernel, i.e. has a few trusted things (resource managers) in a (kernel) process, and everything else goes in user processes. (So far, the only user processes we've created were based on helloworld, ls, and nkiss)

Because of this process-based isolation, IPC is very important.

Coming Up...

IPC is exactly what we'll dig into in the next post:

  • QNX IPC,
  • how QNX IPC is used to implement POSIX file functionality, and
  • other kernel things as well.