Porting XNU (iOS kernel) to BeagleBoard-xM (DM37x): Part 2

19 Jun 2013 in research

This is the second part of my "XNU on BeagleBoard" saga, which took me quite some time to get right after I ran into annoying bugs related to both my implementation of machine dependent OSFMK components and the technical "quirks" of OMAP3. You can find the first part of this here.

Timers and Interrupts

To pick up where I've left of last time, I have managed to get the kernel to boot up to RTC initialisation, where it panicked due to the system clock frequency being 0 Hz (as the system clock code for this board wasn't implemented yet). So the course of action involved first implementing the interrupt controller initialisation code, followed by timer initialisation. DM37x has 11 general purpose timers (GPTIMER) and I've picked GPTIMER1 as the main system timer, which is was clocked at 32kHz (actually, the frequency of the timer logic can be clocked higher, but 32kHz was more than enough for this).

I've set up timer and interrupt controller code in the OMAP3 platform expert (pexpert/arm/OMAP3) adding routines for timebase init, handling interrupts and handling timer-specific interrupts. Interrupts that weren't timer interrupts were left to the IOKit interrupt controller to handle.

For the purpose of this, the timer was set up in incrementing mode, where an interrupt was generated on timer overflow. For now, I've put the timer on the IRQ line, but using FIQ for timer interrupts could be a better idea in the long run, although not crucial.

One thing I got caught on, is that the timer interrupt register has to be cleared in GPTIMER before allowing new interrupts in the interrupt controller, otherwise you get stuck in an interrupt loop.

/* clear timer interrupt */
gSysTimer->TISR |= 0x7;

/* enable new interrupts */
r32(gPIC + INTCPS_CONTROL) = 1;
asm volatile("dsb");

Userland: Attempt #1

Shortly after setting up interrupts and system clock, I was able to boot past IOKit initialisation and up to underland startup. To test everything, I have decided to use an iOS recovery ramdisk as a base, replacing restored with my simple daemon that would print some junk and then hang. To get further, I've prepared the ramdisk and uploaded it onto the MMC card. I also had to add a on option to uBoot to load this ramdisk. I've extended the bootx command to support the ramdisk address as the last argument. Since HFS+ is big endian, I had to flip the endianess when determining the total size of the volume (for copying it into the kernel memory region while still in the bootloader):

size = (OSSwapInt32(header->totalBlocks) * OSSwapInt32(header->blockSize));

After implementing the code for loading the ramdisk, I've changed my boot command in the environmental variables, adding this to it:

"echo loading ramdisk.dmg from flash ...;" \
"fatload mmc ${mmcdev} ${rdsk_addr} /ramdisk.dmg;"\
"echo starting kernel *with* ramdisk ...;" \
"bootx 0x80001000 ${kern_addr} ${dt_addr} ${rdsk_addr}\0" \

Domain Saga

To give a little introduction, ARMv7 has this nifty feature called domains, which affect how MMU treats permissions for pages. Each domain has several mode settings denoted by two bits. Bits 11 denoted manager access, in which access flags for a page were not checked (RWX everywhere) and bits 01 denoted client access in which access flags were checked.

Now, going back to booting up the userland, after launchd[1] has forked, I was experiencing all sorts of weird crashes all over the userland (several launchds were spawned and some crashes while others hung up, here is an example log). What was even more odd, that at the time, the stack was also ending up being executed for some unknown reason.

After spending a lot of time thinking this was an issue in the fork system call, I've finally decided to investigate why the stack was executed. My initial suspicion was a bug in my pmap (physical mapper) where pages somehow ended up getting the wrong access bits. After tracing all entered mappings, I've realised that this was not the case. So it looked like all access permission bits were simply being ignored, I checked the actual bits to make sure they were correct, and they were. I was really confused.

After some time, I've nearly given up, and decided to check DACR, and boom. I had a revelation - uBoot was setting DACR to 0xffffffff unlike bootkit which set it to 0x00000001. This is why all access bits were ignored. Fixing this up and setting domain 0 to client mode, fixed the "forking" issue, allowing launchd[1] to start my little daemon.

 old dacr = 0xffffffff, new dacr=0x01

removeKextBootstrap

Another weird quirk I've noticed was sometimes, I was getting crashes while the kernel was printing out it's backtraces (my debug code in the kernel does symbol lookups when printing backtraces). As I've suspected that something was overwriting pages in the kernel identity mapping, I've added a little macro to all pmap calls to make sure that identity mapping was removed or overwritten.

#define CHECK_VIRTUAL_ADDRESS(va) \
    if (va > gVirtBase && va < (gVirtBase + mem_size)) { \
         panic("%s: va 0x%08x is in the identity region", __FUNCTION__, va); \
    }

This has quickly revealed the culprit - OSKext::removeKextBootstrap, which was responsible for making kernel's __LINKEDIT (a section in the kernel binary that had the symbol table) swappable. Because we didn't have swapping on ARM anyway, I've decided that putting the call to ml_static_mfree under #ifndef __arm__. I would like to note that this part of the routine only gets invoked when you are building your kernel with kxld in it (kext linker), so the kernel is able to load live kernel extensions (commonly known as drivers) dynamically.

iOS devices do not typically do that, and instead rely on a prelinked kernel also known as a kernelcache. While I have the option of building one and using it, I did not feel that it was necessary for now, and I thought that loading kexts dynamically could certainly be useful.

Final Frontier

At the first glance, I have managed to get the userland to boot. When I tweaked the system clock rate (higher load value so quicker overflow), I started getting crashes in dyld when my daemon started up. I've determined that the crash was coming from macho_header being passed as NULL to dyld::start, as opposed to 0x00001000. This value is passed on the initial stack of the application and is set in exec_mach_imgact in bsd/kern/kern_exec.c:

 ap = thread_adjuserstack(thread, -new_ptr_size);
 error = copyoutptr(load_result.mach_header, ap, new_ptr_size);

After two days of debugging, I was not able to get any closer. My watchpoints were not being triggered and that's providing the condition has even occurred. Baffled by that, I've decided to trace pmap entries by exec_mach_imgact and found that when this has occurred the pmap into which the stack info was entered was not the same pmap as the one used when the process executed. Hooray, race condition.

Turns out the code responsible for process control block switching was not entirely correct and failed under certain conditions which could only occur while a mach-o binary was getting started and only if this process was interrupted by a context switch. Turned out I was meant to compare thread maps and not task maps before doing the switch:

/* Should do this: */
if (vm_map_pmap(old_th->map) != vm_map_pmap(new_th->map)) {

/* Instead of this: */
if (vm_map_pmap(old_th->task->map) != vm_map_pmap(new_th->task->map)) {

Having fixed that, I was able to get everything to start up in all of my tests and ran as expected. I was able to get launchd to start up, execute my simple boot.rc which then forked and started retardrc which prints out some stuff before intentionally crashing by doing a NULL deref :)

xnu single user boot

I would guess that the next step would be getting my IOSurface and IOMobileFramebuffer drivers to work, so I would have something very very cool to show. But as far as this post goes, the initial bringup on this board is done and I've enjoyed it.