I’ve recently been working a lot with parsing Mach-O files, so I’m begining to understand in a fair bit of detail how they are structured and how they work. I’ve been developing a library, called libhelper, which can parse Mach-O files. Libhelper-macho also powers Img4helper, and HTool.
This is not a complete writeup or documentation covering everything about Mach-O’s, and I appreciate this has probably been covered to death. It’s not aimed at those who already have an advanced knowledge of how Mach or Darwin works, rather it’s aimed at those who are in a position I was a few weeks ago, having limited knowledge of how Mach-O’s are structured. However I felt this would be a useful resource, and a good way to kick-off my Blog.
There are multiple types of Mach-O, such as Executable or KEXT Bundles, so I can’t cover them all. My aim for this post is to discuss the basics - namely Header, Load Commands and Segment Commands. I may discuss other areas in the future but this is a start.
What are Mach-O files
Mach-O files, or Mach Object Files, are an executable format used on Operating Systems based on the Mach Kernel. This includes Apple’s Darwin iOS, macOS, watchOS etc. There are multiple types of Mach-O file, such as executables, object-code, shared and dynamic libraries, kernel extension (KEXT) bundles and even debug companion files.
Mach-O files are simply binary files, there isn’t particularly anything special about them in that regard. You can read in some bytes into a C structure and boom, you’ve parsed a Mach-O (or at least part of it). Natively, they can only be run on Mach/Darwin/XNU-based systems, however there are some implementations for loading and executing Mach-O files on Linux. Although you can run simple applications this way, the majority of applications will not work due to reliance on certain macOS libraries, such as
A Mach-O is made up of one Mach header, a number of load commands (specified in the header) and the data. The data is organised into Segments, which are made up of 0 to 255 Sections, and there special load commands to describe them. Mach-O files are organised as follows:
- Mach-O Header
- Load Commands
The purpose of this article is to discuss, at a higher level, each of these areas of a Mach-O file, how data is organised and how to load this data from a given Mach-O file into relevant C structures.
Starting with the Mach Header. It’s purpose is to describe what the file contains, and how the Kernel and Dynamic Linker should handle it. The first 4 bytes are, like with any file, it’s “Magic Number”. A Magic Number is used to identify a file format. In the case of Mach-O’s there are three Magic Numbers that one may come across.
0xfeedface for 32-bit,
0xfeedfacf for 64-bit and
0xcafebabe for Mach Universal Binaries / Object files.
Other properties of a Mach-O Header include the cpu type and sub type which define the architecture the Mach-O is built for (e.g.
arm64_32), the number of Load Commands and the size of that area and flags to be passed to the Dynamic Linker. The layout of the header is shown below:
The Mach-O header takes up 32 bytes for 64-bit files, at 28 bytes for 32-bit files. You can populate the the header structure by
memcpy() the correct size into a
mach_header structure, and you’ll be able to access the header elements as normal.
Load Commands are placed directly after the Mach-O header in the file. They specify the logical structure of the file and the layout of the file in virtual memory.
All Load Commands have a common 8 byte structure which identifies the type of the command and it’s size. This common structure is defined as follows:
There are over a dozen Load Commands, some are common across all Mach-O’s and some are only found in certain cases. Load Commands placed after the Mach-O header, with the first being Segment Commands. These are discussed further under Segment Commands.
But Segment Commands are not the only commands that are included in the majority of Mach-O files. The
LC_LOAD_DYLINKER commands specify information such as rebase, bind, weak, lazy and export information for the Dynamic Linker, and the path of the Dynamic Linker the Kernel should use to execute the binary respectively. Mach-O’s frequently require Dynamic Libraries, especially
LC_DYLIB command defines the path for Linker to find the Dylib, and there can be however many of these commands as are required for the number of Dynamic Libraries.
The offset and sizes for both the symbol table and the string table are defined with
LC_SYMTAB, and offsets for local, external, undefined and other types of dynamic symbols are defined with
The last command that I will discuss here is
LC_MAIN which defines the offset for the entry point, so where the Kernel should start executing the binary from. This is only used for
Below is output from an experiemental version of htool showing all of the Load Commands from itself. I’ve ommited some parts because the output is rather long.
$ htool_debug -l $(which htool_debug) HTool Version 1.0.0~Alpha; Sat Jan 4 02:53:36 2020; libhelper-1000.7188.8.131.52/ALPHA_X86_64 x86_64 LC 00: LC_SEGMENT_64 Off: 0x000000000-0x100000000 __PAGEZERO No Section 64 data LC 01: LC_SEGMENT_64 Off: 0x100000000-0x100012000 __TEXT Off: 0x100000b00-0x10000f4cf 59855 bytes __TEXT.__text Off: 0x10000f4d0-0x10000f656 390 bytes __TEXT.__stubs ... LC 05: LC_DYLD_INFO_ONLY Rebase info: 40 bytes at offset 0x14000 (0x14000-0x14028) Bind info: 88 bytes at offset 0x14028 (0x14028-0x14080) No Weak Bind info Lazy Bind info: 1048 bytes at offset 0x14080 (0x14080-0x14498) Export info: 3640 bytes at offset 0x14498 (0x14498-0x152d0) LC 06: LC_SYMTAB 1434 symbols in file symbol table offset: 0x00015478 string table offset: 0x0001b034 string table size: 12680 bytes LC 07: LC_DYSYMTAB 1196 local symbols at 0 169 external symbols at 1196 69 undefined symbols at 1365 No TOC No modtab 135 indirect symtab entries at 110104 No External Relocation Entries No Local Relocation Entries LC 08: LC_LOAD_DYLINKER /usr/lib/dyld LC 09: LC_UUID UUID: 3C7070A4-E053-3DA2-99C6-44DA4D6D2055 LC 10: LC_BUILD_VERSION Build Version: Platform: macOS, Minos: 10.15, SDK: 10.15 Tool 0: LD (v520.0.0) LC 11: LC_SOURCE_VERSION Source Version: 0.0 LC 12: LC_MAIN Entry Point: 0xd40 LC 13: LC_LOAD_DYLIB /usr/lib/libSystem.B.dylib LC 14: LC_RPATH @loader_path/../libhelper/src LC 15: LC_RPATH @loader_path/../editline LC 16: LC_FUNCTION_STARTS Offset: 0x152d0, Size: 376 bytes (0x000152d0-0x00015448) LC 17: LC_DATA_IN_CODE Offset: 0x15448, Size: 48 bytes (0x00015448-0x00015478)
Going back to
struct load_command. Looking at it from the perspective of trying to parse Mach-O’s having a constant format for the first 8 bytes of each Load Command makes detecting and parsing them easier. The following is an example of how we can parse a command, using
LC_MAIN as an example. The code is based off XNU’s
loader.h rather than
If you are interested in learning more about the different types of Load Commands, you can either checkout
EXTERNAL_HEADERS/mach-o/loader.h in the XNU sources, or
include/libhelper-macho/macho-command-types.h from Libhelper.
Going back to Segment Commands, the first couple of Load Commands in a Mach-O are either
LC_SEGMENT for 32-bit, or
LC_SEGMENT_64 for 64-bit. These define an object files Segments.
If you are unfamiliar with how object files work, you have a number of these segments. The
__TEXT segment contains the instructions that will be executed by the CPU, and the
__DATA segment contains both static local variables and global variables. These are both standard, however you may find additional segments such as
__LINKEDIT, and in XNU Kernelcaches, you’ll get even more funky segment names like
Segments are further divided into sections, so for example you’ll find
__cstring in the
__TEXT segment, formatted as
__TEXT.__cstring, as a common one.
The Segment Commands in a Mach-O define what regions of the binary data should be mapped into memory as what. So looking at the
segment_command_64 struct, there’s the segments name as
segname, but then we have two sets of address/sizes.
vmsize define the virtual memory address and size for this segment And
filesize for the segments location and size within the file.
initprot define virtual memory protection for the segment in memory, so this may prevent it from being both writable and executable at the same time. Finally is the flags, which are just a way of giving the Kernel options for loading the segment into memory.
Like I said, we have segments which are divided into sections. These sections are placed directly after the segment command, are included in the
cmdsize and are counted with
nsects. Again, sections essentially dividing up segments into more meaningful chunks, for example
To load these, we must take the offset of the segment command in the file, add the size of the segment structure, and then loop through
nsects times, incrementing the offset by the size of the section struct each time.
To start, the section structure is defined as follows. Again, there are both
section structures, with the difference being the 64-bit
section_64 struct uses
uint64_t for both
size, and has a third
reserved property at the end of the structure although it is not designated for any optional properties:
As I just stated, we can load the correct data into that structure by adding
sizeof (segment_command_64) to the offset of the command in the file, then add
sizeof(section_64) for each of
segment->nsects. Here is an example of what I mean (note this time I am using libhelper code to demonstrate):
mach_segment_info_t struct is not implemented in XNU’s standard
loader.h, so if you’re writing your own Mach-O parser, please ignore references to Libhelper structs.
Looking at this function in more detail. Two arguments are passed to
unsigned char *data pointer to the Mach-O loaded in memory, and an
uint32_t offset which points to the start of the segment command within that
data pointer. This offset is relative to the start of the Mach-O, not the start of the load commands.
Ignoring the code that checks and sets up the
mach_segment_command_t, it starts by calculating the offset of the first section. This is done by adding the
offset passed to the function to the
sizeof() the segment command structure.
The segment command has
nsects containing the amount of sections placed after the command. So, we loop round the number of sections from
segment->nsects and create
mach_section_64_t’s for each one.
We can use
memcpy() to to copy the
ssize amount of bytes we need. We can set the start point for the copying by adding the offset to the data pointer. By doing this, we are incrementing the pointer by the offset, resulting in it pointing to, in this case, the start of the current section struct.
h_slist_append() can be ignored. This is simply adding the section to a Statically-linked list in a libhelper
The last bit of interest here, make sure to increment
sectoff by the size of the
mach_section_64_t struct, so
sectoff will point to the next section structure.
If you are interested, please take a look at libhelper. It has a Mach-O parser that I wrote, and you’ll find the example above.
The actual data, so that is instructions and variables, in a Mach-O are stored after the Load Commands region. Depending on the type of Mach-O, the way this region is used varies.
So, for example. An executable - meaning a Mach-O with the
MH_EXECUTE - would have the segment commands laying out the data region, and a
LC_MAIN command specifying the offset of the entry point instruction the Kernel should jump too when loading. The Kernel will also start the Dynamic Linker specified in the
LC_DYLD_INFO command, and link any specified dylib’s with
This entire region is mapped out by the segment commands. We can inspect this mapping with Mash, or Mach-O Shell, which is part of HTool. Loading the file, we can inspect a particular segment like so.
(Mash) p seg __TEXT Segment: __TEXT Offset: 0x100000000-0x100012000 Size: 73728 bytes Off: 0x100000be0-0x10000f4ef 59663 bytes __TEXT.__text Off: 0x10000f4f0-0x10000f676 390 bytes __TEXT.__stubs Off: 0x10000f678-0x10000f912 666 bytes __TEXT.__stub_helper Off: 0x10000f912-0x100011fa2 9872 bytes __TEXT.__cstring Off: 0x100011fa2-0x100011fa4 2 bytes __TEXT.__const Off: 0x100011fa4-0x100011ff8 84 bytes __TEXT.__unwind_info
To print a segment, we use
p seg __TEXT. This is the short version, if you prefer
print segment __TEXT would also work fine. The first line of the output display’s the start and end addresses of the
__TEXT segment, and it’s total size in bytes.
Underneath, slightly indented, are each of the sections contained within the segment. For example, we can see that the
__TEXT.__stubs section is 390 bytes, and is located from
Two things to note about these addresses, first they are the virtual memory addresses, and second they are relative to the start of the data, not the start of the Mach-O. Before this
__TEXT segment is a
__PAGEZERO segment ranging from
This is only an introduction to Mach-O files. I’d like to continue writing about them and maybe even write a Mach-O loader for Linux.
I hope I covered this fairly well, any feedback would be greatly appreciated. I aim to write these blog posts more often and hopefully they’ll improve over time - both in quality and technical accuracy. For now, you can download Img4helper which you can use to extract Apple Image4 files from the Downloads page linked above, Libhelper sources are available here if you’d like to look at my Mach-O parser, and
htool will be available soon.