Boot image compression

07 Jul 2013 in misc

Because now days, kernel and driver images are fairly large, compressing them prior to reading saving them on the disk or before sending them over TFTP helps reduce boot time. This is especially helpful with TFTP as this is what I use for debug kernel and driver uploads, so I thought that compressing the mach-o binaries inside the IMGX files was a good idea.

At the time, the best possible candidate ended up the LZSS.C compressor, which is very widely in a variety of Apple boot-related things (kernelcaches, mkexts, IOGraphics etc) so I went with it. While LZSS.C does provide fast decompression and a fairly good compression ration (mach_kernel, 8615092 bytes to 4513873 bytes), the compression speed was fairly slow (took around 3 seconds on my machine).

Slow LZSS

Although 3 seconds was not a lot, I've decided to look for an alternative that could be used in place of LZSS.C and foung QuickLz, which according to benchmarks, seemed a lot faster. I've integrated it into the IMGX builder and my bootloader, and found that it was around 4 times faster in terms of compression speed and slightly better in terms if compression as well (as you can see below), while providing the same decompression time as LZSS.C.

QuickLZ

U-Boot shenanigans

24 Jun 2013 in research

One of the things that has always irritated me about u-boot is how command oriented it was. This didn't play nicely with the way I wanted to load XNU, as I had to come up with a solution that would allow me to load drivers before loading the actual kernel.

XNU has three ways of loading drivers at boot time - driver entries in DT, a mkext in DT or a kernelcache. The kernelcache method is the one Apple uses to "load" drivers on iOS devices, where the entire kernel is prelinked into a single blob with drivers. An obvious disadvantage of this method is that those drivers cannot really be unloaded later as they're a part of the kernel image. The device tree method involves loading the bare minimum set of drivers into the kernel memory (for example, the main platform driver and something like an MMC driver) and adding them to the device tree with names like Driver-<ADDRESS>. This is what I've decided to implement in my fork of u-boot.

Because some files may be loaded over TFTP and some from MMC, I needed a unified format which I could use for compressing the files as well as unifying them prior to transfers (as initiating a TFTP transfer takes around 3 seconds, so it was best to only do it once). I've also wanted to utilise LZSS compression in order to reduce the amount that needed to be transferred. So what I've ended up doing is shoving the kernel itself and all the drivers into tables of content (which could contain a single or multiple files). That way files could be kept separate on the MMC while allowing me to do batch transfers of multiple drivers and possibly the kernel in one go over TFTP.

typedef struct {
    uint32_t magic;
    uint32_t ncmds;
    /* ... first command ... */
}
table_of_contents_t;

This also meant implementing two new commands to replace bootx. The first command, imgx loaded a general image (either a table of contents, or a mach command container straight away) and depending on the flags in the container either treated it as a driver, adding them to a linked list or a kernel which would actually have to be mapped out.

else if (command->flags & kMachKernel) {
    /* ... */

    /* Hand over to the mach-o loader */
    ret = load_macho(
        image_address,
        image_size,
        command->load_address,
        gKernelMemoryTop,
        &entry_point,
        &size
    );

    /* ... */
}

The other new command was the mach_boot command which used to data accumulated by all imgx commands to bootstrap and execute the kernel image, in a similar way as the old bootx command.

I also needed to add a kCommandXMLDeviceTree command type which indicated an XML device tree specification file that would be loaded and preserved somewhere, for mach_boot to construct a flattened DT from.

Porting XNU (iOS kernel) to BeagleBoard-xM (DM37x): Part 2

19 Jun 2013 in research

This is the second part of my "XNU on BeagleBoard" saga, which took me quite some time to get right after I ran into annoying bugs related to both my implementation of machine dependent OSFMK components and the technical "quirks" of OMAP3. You can find the first part of this here.

Timers and Interrupts

To pick up where I've left of last time, I have managed to get the kernel to boot up to RTC initialisation, where it panicked due to the system clock frequency being 0 Hz (as the system clock code for this board wasn't implemented yet). So the course of action involved first implementing the interrupt controller initialisation code, followed by timer initialisation. DM37x has 11 general purpose timers (GPTIMER) and I've picked GPTIMER1 as the main system timer, which is was clocked at 32kHz (actually, the frequency of the timer logic can be clocked higher, but 32kHz was more than enough for this).

I've set up timer and interrupt controller code in the OMAP3 platform expert (pexpert/arm/OMAP3) adding routines for timebase init, handling interrupts and handling timer-specific interrupts. Interrupts that weren't timer interrupts were left to the IOKit interrupt controller to handle.

For the purpose of this, the timer was set up in incrementing mode, where an interrupt was generated on timer overflow. For now, I've put the timer on the IRQ line, but using FIQ for timer interrupts could be a better idea in the long run, although not crucial.

One thing I got caught on, is that the timer interrupt register has to be cleared in GPTIMER before allowing new interrupts in the interrupt controller, otherwise you get stuck in an interrupt loop.

/* clear timer interrupt */
gSysTimer->TISR |= 0x7;

/* enable new interrupts */
r32(gPIC + INTCPS_CONTROL) = 1;
asm volatile("dsb");

Userland: Attempt #1

Shortly after setting up interrupts and system clock, I was able to boot past IOKit initialisation and up to underland startup. To test everything, I have decided to use an iOS recovery ramdisk as a base, replacing restored with my simple daemon that would print some junk and then hang. To get further, I've prepared the ramdisk and uploaded it onto the MMC card. I also had to add a on option to uBoot to load this ramdisk. I've extended the bootx command to support the ramdisk address as the last argument. Since HFS+ is big endian, I had to flip the endianess when determining the total size of the volume (for copying it into the kernel memory region while still in the bootloader):

size = (OSSwapInt32(header->totalBlocks) * OSSwapInt32(header->blockSize));

After implementing the code for loading the ramdisk, I've changed my boot command in the environmental variables, adding this to it:

"echo loading ramdisk.dmg from flash ...;" \
"fatload mmc ${mmcdev} ${rdsk_addr} /ramdisk.dmg;"\
"echo starting kernel *with* ramdisk ...;" \
"bootx 0x80001000 ${kern_addr} ${dt_addr} ${rdsk_addr}\0" \

Domain Saga

To give a little introduction, ARMv7 has this nifty feature called domains, which affect how MMU treats permissions for pages. Each domain has several mode settings denoted by two bits. Bits 11 denoted manager access, in which access flags for a page were not checked (RWX everywhere) and bits 01 denoted client access in which access flags were checked.

Now, going back to booting up the userland, after launchd[1] has forked, I was experiencing all sorts of weird crashes all over the userland (several launchds were spawned and some crashes while others hung up, here is an example log). What was even more odd, that at the time, the stack was also ending up being executed for some unknown reason.

After spending a lot of time thinking this was an issue in the fork system call, I've finally decided to investigate why the stack was executed. My initial suspicion was a bug in my pmap (physical mapper) where pages somehow ended up getting the wrong access bits. After tracing all entered mappings, I've realised that this was not the case. So it looked like all access permission bits were simply being ignored, I checked the actual bits to make sure they were correct, and they were. I was really confused.

After some time, I've nearly given up, and decided to check DACR, and boom. I had a revelation - uBoot was setting DACR to 0xffffffff unlike bootkit which set it to 0x00000001. This is why all access bits were ignored. Fixing this up and setting domain 0 to client mode, fixed the "forking" issue, allowing launchd[1] to start my little daemon.

 old dacr = 0xffffffff, new dacr=0x01

removeKextBootstrap

Another weird quirk I've noticed was sometimes, I was getting crashes while the kernel was printing out it's backtraces (my debug code in the kernel does symbol lookups when printing backtraces). As I've suspected that something was overwriting pages in the kernel identity mapping, I've added a little macro to all pmap calls to make sure that identity mapping was removed or overwritten.

#define CHECK_VIRTUAL_ADDRESS(va) \
    if (va > gVirtBase && va < (gVirtBase + mem_size)) { \
         panic("%s: va 0x%08x is in the identity region", __FUNCTION__, va); \
    }

This has quickly revealed the culprit - OSKext::removeKextBootstrap, which was responsible for making kernel's __LINKEDIT (a section in the kernel binary that had the symbol table) swappable. Because we didn't have swapping on ARM anyway, I've decided that putting the call to ml_static_mfree under #ifndef __arm__. I would like to note that this part of the routine only gets invoked when you are building your kernel with kxld in it (kext linker), so the kernel is able to load live kernel extensions (commonly known as drivers) dynamically.

iOS devices do not typically do that, and instead rely on a prelinked kernel also known as a kernelcache. While I have the option of building one and using it, I did not feel that it was necessary for now, and I thought that loading kexts dynamically could certainly be useful.

Final Frontier

At the first glance, I have managed to get the userland to boot. When I tweaked the system clock rate (higher load value so quicker overflow), I started getting crashes in dyld when my daemon started up. I've determined that the crash was coming from macho_header being passed as NULL to dyld::start, as opposed to 0x00001000. This value is passed on the initial stack of the application and is set in exec_mach_imgact in bsd/kern/kern_exec.c:

 ap = thread_adjuserstack(thread, -new_ptr_size);
 error = copyoutptr(load_result.mach_header, ap, new_ptr_size);

After two days of debugging, I was not able to get any closer. My watchpoints were not being triggered and that's providing the condition has even occurred. Baffled by that, I've decided to trace pmap entries by exec_mach_imgact and found that when this has occurred the pmap into which the stack info was entered was not the same pmap as the one used when the process executed. Hooray, race condition.

Turns out the code responsible for process control block switching was not entirely correct and failed under certain conditions which could only occur while a mach-o binary was getting started and only if this process was interrupted by a context switch. Turned out I was meant to compare thread maps and not task maps before doing the switch:

/* Should do this: */
if (vm_map_pmap(old_th->map) != vm_map_pmap(new_th->map)) {

/* Instead of this: */
if (vm_map_pmap(old_th->task->map) != vm_map_pmap(new_th->task->map)) {

Having fixed that, I was able to get everything to start up in all of my tests and ran as expected. I was able to get launchd to start up, execute my simple boot.rc which then forked and started retardrc which prints out some stuff before intentionally crashing by doing a NULL deref :)

xnu single user boot

I would guess that the next step would be getting my IOSurface and IOMobileFramebuffer drivers to work, so I would have something very very cool to show. But as far as this post goes, the initial bringup on this board is done and I've enjoyed it.

Porting XNU to BeagleBoard-xM (DM37x): Part 1

11 Jun 2013 in research

After a fairly long period of not doing anything due to personal issues, I've decided to finally get XNU running on the DM37x system on a chip developed by Texas Instruments. To assist early bringup I've decided to use a JTAG debugger and a serial port adapter.

Baby Steps

To start off, I needed to find a suitable way of booting the XNU kernel on BeagleBoard so I could test my code as soon as possible. My initial plan was to port my own bootloader called BootKit (which already has support for XNU) to that platform and then use it to boot up the kernel. About two days into working on the low level initialization code for the platform bits, it started to seem like a waste of time and I've abandoned this plan.

My next choice was the Das U-Boot bootloader which already had full support for BeagleBoard and was used both as the first level and the second level bootloader. Obviously, this bootloader lacked any support for the XNU kernel so I had to develop my own "extension" for it. To simplify the task, I've decided to recycle the code from the BootKit bootloader, integrating it into the u-boot tree and implementing a command that would compile the device tree definitions, map the mach-o kernel and execute it, which I've named bootx.

In essence, this command would load mach_kernel, map out the mach-o executable file at a certain address (0x80001000 for XNU) parse the XML device tree (yes, that's right, XML parser in a bootloader) specification, flatten it, populate the BootArgs structure and call the kernel entry point, passing the boot arguments to it.

Loading mach_kernel

In order to ease the pain in repetitively having to swap out the SD card, I've decided to use u-boot's built in TFTP support to download the kernel from my development machine (downloading it over UART was out of the question because the debug kernel image was just over 8MB in size). I ended up with a following command in my environment that would download and start the kernel:

/* Boot XNU from the network */ \
"xn=setenv ipaddr 192.168.55.10;" \
    "setenv serverip 192.168.55.1;" \
    "setenv usbethaddr ba:d0:4a:9c:4e:ce;" \
    "usb start;" \
    "echo downloading mach_kernel from tftp ...;" \
    "tftpboot 0x84000000 /mach_kernel;"\
    "echo loading devicetree.plist from flash ...;" \
    "fatload mmc ${mmcdev} 0x83000000 /devicetree.plist;"\
    "echo starting kernel ...;" \
    "bootx 0x80001000 0x84000000 0x83000000 0x0\0"

Platform Expert

The platform expert is a hardware abstraction module used by XNU to access machine specific components (not to be confused with architecture specific components like the MMU), which consists of early access to the serial port, interrupt controller initialization and handling of interrupts from the system timer. Unlike the x86 version, the platform expert routines are specific to each machine. For now, I have only decided to implement the serial output routine for BeagleBoard. Having done that, I've tweaked several kernel makefiles to add suport for OMAP3 as a machine config. I could now build the kernel using the following command:

make TARGET_CONFIGS="debug arm OMAP3" SDKROOT="/Darwin_ARM_SDK/"

Hello XNU

Of course, as expected, soon after starting, the kernel crashed very early without producing any visible output. I have anticipated that early printing might produce some useful information. I have tried to implement semihosting in the kernel in order to get some early output which unfortunately didn't quite work for an unknown reason. Instead of fiddling around with debugger configuration files, I have decided to do something simpler and make my early printing routine print to a preallocated buffer, which I could then dump using the debugger.

Early panic dump

From my early dump, I could see that the kernel was panicking out in arm_init due to a misaligned L1 table. I have quickly fixed that in the bootloader, and having recompiled it, I was able to get past early initialization where it called machine_startup to do general kernel initialization.

Odd crash

Shortly after being started, the kernel died due to a data abort happening before first VM map was created (full log here):

Fatal Exception: map is NULL, prob a fault during VM init, fault_addr is 0x8064bbb4

Quickly, I tried to read that address using a debugger, which seemed to work just fine, so I was sure it was not a virtual memory related problem. Just to be sure I dumped the translation tables and examined the entry for this page, which was perfectly valid. I have then looked at the data fault status register and noticed that it indicated that an external abort has occured (0x8).

Because I was confused as to what this actually meant, I did some quick searching and found an article on TI forums where someone has complained about similar behaviour. Apparently, this SoC asserts an external abort when an STREX or LDREX instruction is used on a page that is mapped with the Shared bit set. As this SoC is uniprocessor, I've simply disabled setting of that bit for pages, which resolved the issue.

Next Steps

Because getting all of the output via a serial console seemed boring, I've decided to implement video initialization in the platform expert so I could use XNU's video console. Usually, this task is left to the bootloader, but I've decided to implement it in the kernel. Because of the complexity of the OMAP DSS system, I took me a few hours to work out how to configure the display controller, after doing which I was able to populate PE_state.video and use that to set up the console. This was the result:

Video console

In order to advance further, I had to implement support routines for setting up the interrupt controller and the system timer in the Platform Expert, which I'm going to explain in my next post.