Cocoa in the Shell

Universal Binary : The Mach-O file format

I’ve been playing with Mac OS X file format for a while now. It is known as Mach-O, which is resulting from the Mach project on which a part of Mac OS X is based.

First a little history, When the Macs crossed from PPC to x86, Apple introduced Universal Binaries applications (UB).

The goal of an UB application is to be able to execute on different types of architecture, the major caveat is that the final application is 2 times larger, but nowadays disks have really huge storage capacity, so it’s not a big deal.

But how did they manage to do this ? Well, it’s very simple, an UB application is nothing more than an archive of 2 applications with a special header.

The header which I’ll refer to as fat_section is composed of 2 structs defined in /usr/include/mach-o/fat.h.

There is one important thing about these structs though, the attributes are stored in big-endian, so if you manipulate these on an Intel based Mac don’t forget to swap the bytes order, or you will have some surprises ;)

Let’s detail the first struct :

struct fat_header
{
    uint32_t magic;
    uint32_t nfat_arch;
};
  • magic : An integer with a big-endian value of OxCAFEBABE, if you want to check this value you can use the constants FAT_MAGIC on a big-endian CPU or FAT_CIGAM on a little-endian one.

  • nfat_arch : An integer which specifies the number of struct fat_arch that follow.

The second struct, fat_arch holds the informations for one architecture.

In a fat_section there is fat_header.nfat_arch struct fat_arch (one for each arch), the maximum number of struct fat_arch for an UB app is 4 (x86, x86_64PPCPPC_64)

Here is the definition of this struct :

struct fat_arch
{
    cpu_type_t cputype;
    cpu_subtype_t cpusubtype;
    uint32_t offset;
    uint32_t size;
    uint32_t align;
};
  • cputype : A constant that define the CPU family type (x86, PPC…)

  • cpusubtype : The type of CPU.

  • offset : Beginning offset of the code for this arch.

  • size : Size of this arch.

  • align : The alignment, this field is necessary to make sure that if the binary change its contents it is correctly aligned.

Now that we have the definition for this 2 structs, we can use sizeof() to get their size :

  • struct fat_header : 0x8

  • struct fat_arch : 0x14

The size of the fat_section is 4Kb (4096 bytes).
So we have 8 + (4 * 20) = 88 bytes occupied of 4096, the remainder is filled with 0, assuming we have 4 structs fat_arch.

When the program is run, the system jumps to the correct offset of code to execute and ignores the other sections, which explains why there is no losses of performances between a single app and an Universal one.

Once you get that, it’s trivial to write an app which removes the useless architectures to reduce the size of a binary.

Tags: ,