QNX From The Board Up #11 - malloc() and mmap()
Take an introductory look at mmap() to allocate memory, then use it to write your own version of malloc(). (And build a KISS version of cat for your custom image!)

Welcome to the blog series "From The Board Up" by Michael Brown. In this series we create a QNX image from scratch and build upon it stepwise, all the while looking deep under the hood to understand what the system is doing at each step.
With QNX for a decade now, Michael works on the QNX kernel and has in-depth knowledge and experience with embedded systems and system architecture.
Previously On...
In the last two posts we talked about the QNX kernel having 2 major components (the kernel process and the kernel "code") and about how the kernel process has a Memory Manager server which responds to messages that request the functionality of the function mmap()
, which is the 'first among equals' when it comes to requesting memory from the kernel.
The QNX user documentation for mmap()
is not trivial, because this is not a trivial function. And that's why we're going to start with a simple use case, and then, with that footing firmly established, start looking at more complicated use cases for mmap()
.
To see the simplest way to use mmap()
, I'd like to look at a very simple version of malloc()
which calls mmap()
. If we learn a bit more about malloc()
along the way too, all the better.
malloc()
If we look at the C specification, let's say the C11 version, i.e. ISO/IEC 9899:2011, in the section about the library, specifically "General utilities", 7.22.3.4, it says:
void *malloc(size_t size);
Description
The malloc function allocates space for an object whose size is specified by size and
whose value is indeterminate.
Returns
The malloc function returns either a null pointer or a pointer to the allocated space.
That's it. That's all she wrote. So how can we write our own?
Aside: Object?
For our OOP friends, 'object' in this context is not an instance of a class. The spec says it's a "region of data storage in the execution environment, the contents of which can represent values". I tend to describe it, loosely, for better or for worse, as "an instance of a type", but, here, it really is a just a bag of bytes.
So, let's see what we can say about the memory that we expect malloc()
to acquire for us. I'm sure most would agree on the obvious:
- readable
- writeable
And then I might be tempted to ask some more questions, like:
Q) This is just for you, right?
A) Correct. Nobody else should have access to this memory. MINE.
Q) Do you care about the address? i.e. what value the pointer has?
A) Not really.
Q) Are you going to be writing code to this memory, and then executing it?
A) What?! I'm not a JIT. No. Just normal RAM, please.
This is now very interesting, as a luggage memory problem. We have enough to wade through the options for mmap()
to try to implement our own malloc
.
mmap()
Let's take a look at this puppy and see if we can figure out what's what:
void * mmap( void * addr,
size_t len,
int prot,
int flags,
int filedes,
off_t off );
- The first argument is asking "Do you have a specific virtual address where you want the memory?" We do not, so we pass
NULL
which means "I. do. not. care." - Second argument will be the length, which is just the size passed to
malloc()
. Simple. - As for
prot
ection? We need read access (PROT_READ
) and write access (PROT_WRITE
). Done. - Flags. We said we want it only for ourselves, so
MAP_PRIVATE
it is. But wait, there's more! We just want "RAM". Nothing special. And that's known as "anonymous memory", ergo we needMAP_ANON
. filedes
only applies to other uses cases involving file descriptors, so,NOFD
is it.off
also only applies to other use cases. 0.
We are done!
Wait! Shoot. Not done yet. What's the return code?
"The address of the mapped-in object, or MAP_FAILED
if an error occurred (errno
is set)."
Ok. Good to go now.
Attempt #1
void*
my_malloc(const size_t size)
{
const int prot = PROT_READ | PROT_WRITE;
const int flags = MAP_PRIVATE | MAP_ANON;
void * const p = mmap(NULL, size, prot, flags, NOFD, 0);
if (MAP_FAILED == p) {
return NULL;
}
return p;
}
Let's try it out:
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <sys/mman.h>
void*
my_malloc(const size_t size)
{
const int prot = PROT_READ | PROT_WRITE;
const int flags = MAP_PRIVATE | MAP_ANON;
void * const p = mmap(NULL, size, prot, flags, NOFD, 0);
if (MAP_FAILED == p) {
return NULL;
}
return p;
}
int main(void)
{
uint8_t * p = my_malloc(42);
printf(" p : %p\n", p);
// Test it's writeable and readable
*p = 42;
printf("*p : %" PRIu8 "\n", *p);
}
# simplemalloc
p : 13e5613000
*p : 42
Aside: p
is for pointer, not portable.
The behaviour of p
when passed to fprintf
and family is rather interesting. The C standard says "The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printing characters, in an implementation-defined manner."
"Implementation-defined manner", eh? Lemme guess: implementations differ?
In the QNX C library implementation of p
, we print the value in lower-case hex, without a leading "0x
". If you want a leading "0x
", you have to provide that yourself. (Maybe we leave it out because some people prefer to use Intel's "H
" suffix? I doubt it. I'm betting this falls into "Do the least (fastest), especially when debugging.")
What happens on Ubuntu? Lower-case hex with a leading "0x
". And macOS? Same as on Ubuntu.
What happens if you print a pointer with the value NULL
?
- QNX prints
0
. - Ubuntu prints
(nil)
. - macOS prints
0x0
.
If we take a quick detour into "alternative forms", i.e. using the '#'
flag, things become even more interesting:
QNX
%x = 'ffff1234abcd2468'
%#x = '0xffff1234abcd2468'
%p = 'ffff1234abcd2468'
%x = '0'
%#x = '0'
%p = '0'
Ubuntu
%x = 'ffff1234abcd2468'
%#x = '0xffff1234abcd2468'
%p = '0xffff1234abcd2468'
%x = '0'
%#x = '0'
%p = '(nil)'
Unfortunately there is no "alternative form" for p
, but at least the GCC compiler will warn you about that if you try.
Anywho, we're still at: what you get from %p
depends on the implementation within whatever C library you're using; consult your local listings.
Say that again?
Let's say we run our program that calls my_malloc(42)
(basically just mmap()
) a few times back to back. i.e.
- run program
- call my_malloc(42)
- print the pointer returned
- program exits
- GOTO 1
What do we get for pointer values being returned?
I'll elide the details of running it over and over at the command line, and just capture here the output, where each line represents an individual time the program runs and exits:
p : 2636c24000
p : 1d83d24000
p : 49e0695000
p : 4cd316b000
p : 13d6f64000
p : 1932fa6000
p : 382ff5c000
p : 288d2f3000
p : 400019e000
p : 4790c0f000
p : 3620289000
p : 33088f7000
p : 567f43a000
p : 4f55e97000
p : 2e9d223000
p : 469d0d3000
Interesting. I see two patterns here:
- The bottom (least significant) 12 bits (bottom 3 nybbles) are always 0.
- The upper bits are kinda all over the place, but, not completely random within the range of possible 64-bit values.
For those upper bits, it's like there's something that is randomizing where the allocation is being placed in the virtual address space. That rings a bell. For the kernel binary, procnto-smp-instr
, there's a parameter, r
for the command-line parameter -m
which allows you to enable or disable address space layout randomization (ASLR). The default is that ASLR is enabled. Alrighty, then. What happens if you disable it?
Let's modify our build file from
[virtual=x86_64,multiboot] boot = {
startup-x86 -D8250
procnto-smp-instr
}
to
[virtual=x86_64,multiboot] boot = {
startup-x86 -D8250
procnto-smp-instr -m~r
}
to disable ASLR and see what happens.
One thing I like to do when I change procnto
command-line params, is look at the file /proc/config
to confirm I'm getting what I expect. It's just a text file, so you can just cat
it.
We could use the cat
provided by QNX, but, what fun is that?
static
bool
display_file(char const * const filename, const int fd_out)
{
const int fd_in = open(filename, O_RDONLY);
if (-1 == fd_in) {
printf("Cannot access '%s' (%s)\n", filename, strerror(errno));
return false;
}
char buffer[4096];
bool return_value = false;
while (true) {
const ssize_t num_read = read(fd_in, buffer, sizeof(buffer));
if (num_read <= 0) {
return_value = (0 == num_read);
break;
}
// Have to handle write() not writing all
ssize_t total_written = 0;
while (total_written < num_read) {
const ssize_t num_to_write = num_read - total_written;
const ssize_t written = write(fd_out, &buffer[total_written], num_to_write);
assert(written <= num_to_write);
if (written < 0) {
perror("Failed to write (%s)");
close(fd_in);
return false;
}
total_written += written;
}
}
close (fd_in);
return return_value;
}
int main(int argc, char* argv[])
{
for (int i = 1; i < argc; i++) {
if (true != display_file(argv[i], STDOUT_FILENO)) {
return EXIT_FAILURE;
}
}
return EXIT_SUCCESS;
}
Add the right headers, and it's a cat
. No bells, no whistles.
Ok, now that we have a cat
util, let's take a look at /proc/config
before we make any changes to the procnto-smp-instr
command-line parameters.
/ # cat /proc/config
fd_close_timeout:30
ker_verbose:1
maxfds:1000
nohalt:false
pregrow_size:524288
priv_prio.prio:64
priv_prio.saturate:0
procfs_ctl_umask:022
procfs_umask:066
shutdown_stack_len:128/0
pathtrust:0
small_msg_max:1200
prp_load_profile:0
clock_freq:1000
clock_prio:254
ignore_unexpected_interrupts:true
safety_state:on
proc_thread_pool:3,10,75
mm_cleanup_prio:10
mmflags:0x205 (BACKWARDS_COMPAT,RANDOMIZE,V4)
virtualization:disabled
thread_reaper_threshold:16
thread_reaper_prio:10
intrctlr:N/A
The important bit for our discussion here and now is
mmflags:0x205 (BACKWARDS_COMPAT,RANDOMIZE,V4)
We can see that ASLR is enabled because RANDOMIZE
is being printed.
Let's add the -m~r
in, and see what that line in /proc/config
looks like now:
mmflags:0x5 (BACKWARDS_COMPAT,V4)
No ASLR for you!
What happens now when we run our little program many times?
p : 904b000
p : 904b000
p : 904b000
p : 904b000
p : 904b000
p : 904b000
p : 904b000
p : 904b000
p : 904b000
Not very random.
Ok, let's put things back the way they were
[virtual=x86_64,multiboot] boot = {
startup-x86 -D8250
procnto-smp-instr
}
and check
mmflags:0x205 (BACKWARDS_COMPAT,RANDOMIZE,V4)
and then modify our program to do several allocations, instead of just one:
for (unsigned i = 0; i < 5; i++) {
uint8_t * p = my_malloc(42);
printf(" p : 0x%p\n", p);
}
and then run the program once:
p : 1043cba000
p : 1043cbb000
p : 1043cbc000
p : 1043cbd000
p : 1043cbe000
The individual allocations are not random?! But we turned ASLR back on!
Let's run the program a few more times
p : 5ac4fc3000
p : 5ac4fc4000
p : 5ac4fc5000
p : 5ac4fc6000
p : 5ac4fc7000
p : 1c7a721000
p : 1c7a722000
p : 1c7a723000
p : 1c7a724000
p : 1c7a725000
p : 4c327b8000
p : 4c327b9000
p : 4c327ba000
p : 4c327bb000
p : 4c327bc000
Ah, the starting address when each program starts is random, but, within a program, the individual allocations are not random.
Now, keep in mind, this is not the malloc()
in the QNX libc. This is our my_malloc()
function that just lightly wraps mmap()
.
But, it's true that under the covers even the QNX libc malloc()
at some point uses mmap()
.
What we're seeing here is behaviour specific to mmap()
. (There are other nuances, but, they're variations on a theme.)
In fact, let's try with the malloc()
and see what happens:
p : 32e8137370
p : 32e81377e0
p : 32e8137820
p : 32e8137860
p : 32e81378a0
And now run it a few more times
p : 44d40a9370
p : 44d40a97e0
p : 44d40a9820
p : 44d40a9860
p : 44d40a98a0
p : 29cc7f2370
p : 29cc7f27e0
p : 29cc7f2820
p : 29cc7f2860
p : 29cc7f28a0
p : 2649413370
p : 26494137e0
p : 2649413820
p : 2649413860
p : 26494138a0
A few things to notice here:
- the starting addresses are random across processes,
- just like what we saw with
mmap()
- just like what we saw with
- individual addresses within a process are "predictable" (sequenced?), but
- kinda like what we saw with
mmap()
- kinda like what we saw with
- only the bottom 4 bits (bottom 1 nybble) are (seemingly) guaranteed to be 0.
- vs. the 12 we saw with
mmap()
- vs. the 12 we saw with
Recap: With mmap()
, bottom 12 bits are 0, whereas with malloc()
the bottom 4 bits are 0.
Those of you familiar with this area will quickly point out:
mmap()
has a resolution of one page, which is the fundamental unit of mapping used by a chunk of hardware known as the Memory Management Unit (MMU)- It seems that QNX is configuring the MMU to use pages with a size of 4096 bytes, i.e. 0x1000 bytes.
- check out
sysconf(_SC_PAGESIZE)
- check out
- The start (bottom) address of a page is always an integral multiple of the page size, 4096 bytes, meaning
- the start address of a page always has the bottom 12 bits set to 0.
But, that resolution of 1 page (4096 bytes) comes at a cost:
- If you want 1 byte of memory,
mmap()
says "Here are 4096 bytes." - If you want 4097 bytes of memory,
mmap()
says "Here are 8192 bytes."
Very simple, but, not very efficient use of memory. And, let's not forget that mmap()
sends a message to the Memory Manager, and sending a message means a kernel call. That ain't free. It's not bad, but, nobody ever said "Can your system please be slower?" (Almost never).
But, malloc()
, which has to get memory from QNX by calling mmap()
, does not need to restrict itself to page-aligned addresses. It can be more efficient and hand out chunks of memory smaller than a page. And, we certainly saw that based on the addresses returned by malloc()
. And, if malloc()
has already called mmap()
and has a page with some unallocated memory within it, and someone calls, say, malloc(42)
, malloc()
can just (atomically) adjust the data it has to manage the allocations to say "taken", and then just return a pointer. No kernel calls. Very simple. Very easy. (Throw in a dollop of TLS, and you got yourself a stew!)
Let's test that out, and allocate 1 byte with the real malloc()
and see what we get:
p : 3a71dc6370
p : 3a71dc67c0
p : 3a71dc67e0
p : 3a71dc6800
p : 3a71dc6820
Why are they not 1 byte apart? That is one zany sequence. It looks we're getting some visibility into malloc()
minutiae. Regardless, it seems, based on the above, that the smallest delta between allocations is 0x20, i.e. 32 bytes.
And I suppose this seems reasonable. I doubt people malloc(1)
very often, but, 32 bytes might be a sweet spot for general smaller allocations. File names, small structs, etc.
Let's not get into the real malloc()
's internals. Rather, there's one more thing here that the real malloc()
is also taking into consideration that we have not yet talked about: alignment.
Alignment
In the C specification, the above description for malloc()
is not all it says about malloc()
's requirements. For C11, 7.22.3 states a few things that apply to malloc()
, calloc()
, realloc()
, and aligned_alloc()
, and the one of interest here is this:
"The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated)."
"fundamental alignment requirement"?
Elsewhere in the C specification, there's a whole section about the alignment of objects. (See 6.2.8 in C11) The key bits (no pun intended) of that are:
"An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated."
and
"A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all contexts, which is equal to _Alignof (max_align_t)
."
Let's throw into our program a printf
for this _Alignof
thingie and see what we get:
printf("_Alignof(max_align_t) = %zu\n", _Alignof(max_align_t));
gives us
_Alignof(max_align_t) = 16
16 bytes!? You're telling me there's a C type that is 16 bytes big? Let's take a look at how big our C types in QNX are. Off the top of my head, the obvious types:
#define PRINT_SIZE(type) do { \
printf("sizeof(%14s) = %2zu\n", #type, sizeof(type)); \
} while(0)
PRINT_SIZE(int);
PRINT_SIZE(long int);
PRINT_SIZE(long long int);
PRINT_SIZE(size_t);
PRINT_SIZE(float);
PRINT_SIZE(double);
This prints out:
sizeof( int) = 4
sizeof( long int) = 8
sizeof( long long int) = 8
sizeof( size_t) = 8
sizeof( float) = 4
sizeof( double) = 8
That looks about right. What gives with the 16 bytes then? A quick scan through the C spec reminds us of one more data type, long double
. Let's add that:
PRINT_SIZE(long double);
and we now get
...
sizeof( float) = 4
sizeof( double) = 8
sizeof( long double) = 16
There it is: 16 bytes. Let's double-check the alignment requirements of these types with a new macro:
#define PRINT_TYPE_INFO(type) do { \
printf("%14s: sizeof = %2zu, ", #type, sizeof(type)); \
printf("_Alignof = %2zu\n", _Alignof(type)); \
} while(0)
and, throwing the char
in there too for good measure:
char: sizeof = 1, _Alignof = 1
int: sizeof = 4, _Alignof = 4
long int: sizeof = 8, _Alignof = 8
long long int: sizeof = 8, _Alignof = 8
size_t: sizeof = 8, _Alignof = 8
float: sizeof = 4, _Alignof = 4
double: sizeof = 8, _Alignof = 8
long double: sizeof = 16, _Alignof = 16
So, long double
has the largest alignment requirement, and therefore malloc()
has to take this into account when it returns a pointer: it must be 16-byte aligned. i.e. some integral multiple of 16, and therefore the bottom 4 bits must be 0.
Well, it has to be at least 16-byte aligned. 32-byte aligned is also 16-byte aligned.
(And yes, there are people who take advantage of this to use those guaranteed-0 bits in a pointer to encode up to 4 bits of information within the pointer; you just have to remember to mask them out before dereferencing the pointer.)
Alignment? Who Cares?
Performance. Sometimes hardware restrictions. Bus protocol (e.g. AMBA, TileLink) restrictions that cause unaligned accesses to be broken up into multiple transactions (whereas they would have been one transation if properly aligned). Big (and extremely interesting) topic, with implications on struct
padding.
Quick Recap
Using malloc()
as a use case, we figured out how to call mmap()
to give us some memory, some plain old RAM.
We saw how ASLR means randomness across instances of a process, but not across individual calls to mmap()
within an individual process.
We created a KISS version of cat
so we can look at /proc/config
. Now we have ls
, cat
, and our nkiss
shell in our toolbox of homemade OS utilities.
We also saw that the real malloc()
optimizes usage of memory and performance when working with contiguous chunks smaller than a page, but, it is still bound by fundamental alignment requirements.
And we saw that %p
is handy, but, has a couple gotchas if you're porting code across platforms.
Coming Up...
We'll continue with our malloc()
implementation, which is lacking a free()
. Should be no surprises there, right?
After that, we'll look at some more "complicated" use cases for mmap()
: shared memory, file-backed mappings, and typed memory.