Handling Kernel Extensions in HTool


As part of HTool I wanted to add in-depth analysis of iOS Kernel Caches - especially Kernel Extensions. The iOS version of XNU differs from that of macOS as the kernel is instead shipped as a cache file, rather than a simple executable binary. The kernelcache differs from the standalone kernel as instead of shipping seperate .kext Mach-O files in a seperate directory, which the Kernel then searches for and load's, iOS kernelcache's have all the extensions bundled into the same Mach-O file. This is similar to how all libraries are merged into the dyld_shared_cache.

There are two methods Apple have used recently for formatting the Kernelcache. In both styles of kernelcache the KEXTs still have their own seperate Mach-O's embedded in the cache file, however the difference is that with the old, split-style, the KEXTs were completely split into their own Mach-O's to the point where you could extract the entire thing. In the new style, as I shall explain, a majority of the segments have been "merged" with the rest of the kernel. There have also been significant changes to how the kexts are addressed and found within the cache.

Before we start I'd like to mention that the code example's are intentionally vague. As HTool is not open-source I haven't pasted the code word-for-word.

Split Style Caches

The Split-style, or the "old-style" kernel cache was used for all devices Pre-iOS 12, and for a few during the early iOS 12 releases. With this style the entire Mach-O of a kernel extension was stored in one place with it's address and Bundle ID referenced from the __PRELINK_INFO segment, which contained an XML. The KEXT's have all their segments, so can be loaded directly into a disassembler/decompiler such as IDA. The XML essentially maps out the Kernel Extensions in the file and HTool can analyse and parse this XML into a list of KEXT load addresses, Bundle ID's and version tags. To do this, run htool -k kernelcache.arm64.

Listing KEXT's from an old-style kernelcache.

Looking at the above output, HTool identifies our kernel as being the old-style format. By analysing the __PRELINK_INFO segment it is able to identify 216 extensions and list each of the bundle identifiers and their respective source versions.

Finding KEXTs

We can use HTool to start analysing the kernel to get an idea of how things work, and then try to extract an extension for us to analyse. We'll be using that same iOS 11 kernel cache, which uses the split-style format. Start by listing the kernel's segments, which you can do by running HTool with the -l flag and passing a filename:

Listing iOS 11 Kernelcache `__PRELINK_INFO` segment.

(You can ignore the grep, that's just to limit the size of the output to save room on the page :-))

Now, we want to be dealing with the __PRELINK_INFO segment only, we don't want to be searching through the entire binary, Libhelper comes in handy for this. We can use the mach_segment_command_from_info() function to find the __PRELINK_INFO segment in the Kernel's Mach-O. If the return is not NULL, it means that the given segment exists. If you'd like to extract the Prelink segment as an actual .xml file, you can use macho-section by running $ macho-section kernel.arm64 __PRELINK_INFO __info and renaming the produced file to .xml.

    mach_segment_command_64_t *prelink_info_seg = malloc (sizeof (mach_segment_command_64_t));
    prelink_info_seg = mach_segment_command_from_info (macho->scmds, "__PRELINK_INFO");

    if (prelink_info_seg) {
        // Continue from here.
    }
    ...

Once the segment command is verified to exist we'll need to actually access the data from it. To do this, we can simply memcpy() the area of the Mach-O specified by prelink_info_seg. We'll then have the Prelink XML loaded, to which we can start parsing.

One of the checks that are done for verifying the Kernel Cache style in HTool is checking if the "PrelinkExecutableLoa" string exists in the Prelink XML. I'll discuss this more further on when we cover Merged Cache's, but essentially the new style kernel's do not have this XML key at all.

    ...
    // Create a string the size of the segments data.
    char *dictionary = malloc (prelink_info_seg->filesize);

    // Copy n bytes from the segments offset.
    memcpy (dictionary, macho->data + prelink_info_seg->fileoff, prelink_info_seg->filesize);

    // See if we can get a pointer to the start of a PrelinkExecutableLoa str
    PrelinkExecutableLoa_str = strstr (dictionary, "PrelinkExecutableLoa");
    if (PrelinkExecutableLoa_str) {
        ...
    }

At this point we can be fairly certain that we're in the correct place. So, we can now look for the first KEXT reference. Again, like we did with "PrelinkExecutableLoa", use strstr() to find the first occurance of this string, that should place us at the start of a dictionary within the XML for the first KEXT.

    ...
    // Look for the first occurance of the "CFBundleNane" key.
    char *kext_name_ptr = strstr (dictionary, "CFBundleName</key>");

    // We loop until kext_name_ptr no longer has a value. Each loop we
    //  jump to the next occurance of this string.
    //
    while (kext_name_ptr) {
        ...
    }

Once we have found the first, or just any, occurance of "CFBundleNane" the next few lines of XML should look something like the following:

`PRELINK_INFO` XML.

Two things to notice here. Firstly, that the "CFBundleNane" follows the start of a <dict>, and secondly, we have a kernel pointer. You can identify non-tagged kernel pointers by looking at the 8 MSB's (Most Significant Bits). If they are equal to 0xfffffff0, you have yourself a kernel pointer. I'll cover tagged-pointers more when discussing merged-style kernel caches, as they are not relevant here.

Continuing on, we need to get that kernel pointer from the XML into a uint64_t so we can try to read the data from that address. Start by moving to the end of the "CFBundleName" string, and search for the first occurance of "_PrelinkExecutableLoadAddr". There should be only one occurance of this in the dictionary, so we can safely ignore those few lines denoted ....

    ...
    // Move to the end of the string
    dictionary = strstr (kext_name_ptr, "</string>");

    // Now move to the _PrelinkExecutableLoadAddr string
    char *prelink_addr = strstr (kext_name_ptr, "_PrelinkExecutableLoadAddr");
    if (!prelink_addr) {
        ...
    }

Now we have a pointer to the "_PrelinkExecutableLoadAddr" key, we can handle the load address. First we get the pointer for the start of the address by looking for the next occurance of "0x", and the same for the end of the address, which is a "<". As we require the 0x too, we allocate 18 bytes rather than 16, and copy starting from the load_addr_ptr into uint64_t load_addr.

    ...
    // Find the start of the load address
    char *load_addr_ptr = strstr (prelink_addr, "0x");

    // Find the end of the load address
    char *end_of_load_addr = strchr (load_addr_ptr, "<");

    // Copy the length of the address in the XML, including the "0x"
    memset (load_addr, '\0', 18);
    strncpy (load_addr, load_addr_ptr, 18);
    ...

We now have the load address of the KEXT. As this is just a really bodged XML parser, we can continue doing this for the CFBundleIdentifier too, so I won't cover that here. If we continously move forward the pointer to the KEXT dictionary, we eventually will run out of "CFBundleName"'s in the dictionary, which will therefore signal we have found all the KEXTs.

You can then add everything we found to a list, or print it out as you go, to get an output like we did before:

`PRELINK_INFO` XML.

Extracting KEXTs

So far we have covered how to parse the Prelink XML to discover the KEXT bundle ID's and their load addresses. Now to actually extract them. I won't show any code for this otherwise I might as well opensource HTool.

We have the BundleID and Load Address of the KEXT. However, the address we have, like I have already mentioned, is a kernel pointer, not a file offset. One way you could do this is to take the vmaddr of the first segment in the Kernelcache Mach-O away from the kernel pointer we have, and then add the size of the header and load commands region - this should land you with a file offset, rather than a kernel pointer.

Once you have an offset rather than a kernel pointer read a few bytes from that offset and check it's 0xfeedfacf, otherwise something went wrong. In HTool, I use some unreleased Libhelper functionality which allows one to create a macho_t structure from a given memory address. It works exactly the same as loading a file, with the addition of calculating the size of the file as Mach-O's do not contain a file size property in their header's.

If we choose a KEXT to try analysing, in this case I've chosen com.apple.iokit.IONetworkingFamily. Running htool -K com.apple.iokit.IONetworkingFamily kernelcache.arm64 will extract this KEXT from the cache file and write it to a file named after the KEXT's bundle identifier. Once the KEXT is extracted, we can analyse it's Mach-O with -l:

Analysing an old-style KEXT with HTool.

Notice that everything looks as it should - all the segments are present. If we try to load this into IDA we get a perfect result. It identifies all the functions, albeit we don't have any symbols, but successfully disassembles and decompiles.

Open the KEXT in IDA.

The old-style KEXTs are managable exactly like normal binaries, but how does HTool manage this? Well, first I check for a distinct difference between the two kernel formats. The old-style has the PrelinkExecutableLoa string in the __PRELINK_INFO XML, this is one of the ways HTool can tell the difference between the two styles. If this string in the XML is present, it's simply a case of parsing an XML.

Split-Style Cache Summary

So a quick summary. We now know that with previous Kernel Cache formats the KEXT's were complete Mach-O's placed in the __PRELINK_TEXT.__text section, and mapped out by an XML in the __PRELINK_INFO.__info section. One can parse the XML to find the load address, Bundle name and identifier and version number of the KEXT. Once the load address is converted from a kernel pointer into a file offset, you can then attempt to load the KEXT from the kernel cache file which, if you examine the load commands with -l, you will find that the load address should fall in the __PRELINK_TEXT segment.

A significant upside of this implementation was that we got full Mach-O files which can be easily loaded into IDA. Despite there being no symbols, there was still string info. However, as I will now explain, the new-style format make life much more difficult for reverse engineering Kernel Extensions.


Merged Style Caches

Brandon Azad does an excelent job of explaining the new kernel cache format, but I shall still take a crack at it here.

With the release of iOS 12 Apple changed significantly the format and design of the iOS Kernelcache. Gone was the neat, nice to work with Split-Style cache, and in comes what we have today - Merged-style. But what exactly makes this new kernelcache format "merged"? Well, as I already discussed, Kernel Extensions were traditionally complete and embedded in the __PRELINK_TEXT segment of the Kernel's Mach-O, that's all changed now.

Merged Style Cache Changes

The segments __PRELINK_TEXT, __PLK_TEXT_EXEC, __PRELINK_DATA and __PLK_DATA_CONST are now completely empty, the "_PrelinkExecutableLoadAddr" keys have been removed from the __PRELINK_INFO XML, along with "_PrelinkLinkKASLROffsets" and "_PrelinkKCID", and finally, ALL symbols have been removed.

This presents a problem for the method we used to find and extract KEXTs with the old-style kernel caches. We can no longer rely on the "_PrelinkExecutableLoadAddr" key in the XML to find the load addresses for KEXTs, and Apple have obviously moved the KEXTs somewhere else because those four segments previously containing the KEXTs data have been emptied.

They also added a number of new segments, but there are two in particular to focus on. __PRELINK_INFO.__kmod_info and __PRELINK_INFO.__kmod_start now reside in the same segment as the Prelink XML. This now gives me a chance to introduce Mash (Mach-O Shell).

Using Mash to print a specific Segment.

We can use Mash to print a specific segment from a loaded Mach-O, in this case running p seg __PRELINK_INFO gives us the information about sections contained in __PRELINK_INFO. You could also run print segment __PRELINK_INFO and you will get the same response.

So, looking at Mash's output, these two new kmod sections have something in them, but what? Let's try extracting the two sections and using HTool's --hex to check the first few bytes, starting with __kmod_info.

Analyse the `__kmod_info` section.

Looking at the result, notice how the least significant bits are ff 17 00? Brandon Azad mentions how the new-style kernel caches use "Tagged Pointers". These are clearly the same tagged pointer he refers too. There's more!

kmod_info is actually referenced in the XNU sources in darwin-xnu/osfmk/mach/kmod.h. Looking at the kmod_info struct, it seems that it could describe a KEXT, although the output from the __kmod_info section doesn't back this up.

typedef struct kmod_info {
    struct kmod_info  * next;
    int32_t             info_version;           // version of this structure
    uint32_t            id;
    char                name[KMOD_MAX_NAME];
    char                version[KMOD_MAX_NAME];
    int32_t             reference_count;        // # linkage refs to this 
    kmod_reference_t  * reference_list;         // who this refs (links on)
    vm_address_t        address;                // starting address
    vm_size_t           size;                   // total size
    vm_size_t           hdr_size;               // unwired hdr size
    kmod_start_func_t * start;
    kmod_stop_func_t  * stop;
} kmod_info_t;

It turns out that those tagged pointers in the __kmod_info section are actually pointers to kmod_info_t structures. The output of __kmod_start is also very similar with tagged pointers just placed next to each other. The addresses from the two sections match up, so the 20th address, for example, in __kmod_info relates to the 20th address in __kmod_start.

The addresses in __kmod_info point to these kmod_info_t structs which contain the name and version of the Kernel Extensions. The corresponding address in __kmod_start points to the start of the KEXT's Mach-O! Clever, Apple, Clever.

Now we can adapt the code from before to instead find the information about each kext by following pointers in the __kmod_info table, and pair it with the correct address from __kmod_start. Then, once we want to extract a KEXT from a new-style kernel cache, we can look up it's bundle ID in the list, follow the pointer and carve out the Mach-O.

However, we do have a problem. The tagged pointers. Brandon Azad does a better job of explaining how they work so I refer you to his post on that. After some searching around It turns out you can untag a pointer like so: ((a) | UINT64_C(0xffff000000000000)).

After rewriting a lot of code, HTool now finds a outputs all the Kernel Extensions in the new-style kernel caches:

List KEXTs in new-style Kernel Caches.

If we try to extract a KEXT, that works too!

Extracting KEXTs in new-style Kernel Caches.

However, there is a slight problem. When we try to run the extracted KEXT through HTool, there is a noticable difference...

Analysing Merged Kernel Extensions.

All of the segments are gone, apart from __TEXT_EXEC. This is a problem, because now you cannot load the KEXT into IDA, the only way is to use a command-line disassembler, such as jtool and soon HTool, to disassemble the single segment.

Disassemble the single segment from new-style KEXTs.

Merged Style Caches Summary

So what exactly have Apple done? Well, all of the other sections of the KEXTs are now merged with the corresponding segments of the Kernel's Mach-O. The __TEXT, __DATA_CONST, __DATA and __LINKEDIT segments now contain everything from the KEXTs.

This makes it rather difficult to reverse engineer the KEXTs like with the old format. We still have the executable code, but it's more difficult to follow references to data. Take this small snippet from com.apple.iokit.IONetworkingFamily:

0x7fff0081c812c      ADRP      X8, 2094946            
0x7fff0081c8130      ADD       X8, X8, #784!!             
0x7fff0081c8134      LDR       X9, [X8, #1472]!       
0x7fff0081c8138      MOVK      X8, 0x43aa, LSL 48         
0x7fff0081c813c      BLRAAZ    X8
0x7fff0081c8140      TBZ       W0, #0, 0x7fff0081c8260     ; This references something now merged in other
                                                        ; segments
0x7fff0081c8144      MOVZ      W0, 0xd0                   
0x7fff0081c8148      BL        0x7fff0080fd7c0

Summary

I hope you learned something by reading this. It took a lot of work to get this functionality of HTool working. A number of hours reversing, googling and searching through sources eventually got me to this point. If you have any questions, or even anything I've missed or written incorrectly, please do let me know! I welcome any feedback.