2

I'm writing a cross-platform memory analysis library and one of the functions I provide is GetProcessModules. On windows I use EnumProcessModules to get a list of all modules loaded in a process along with GetModuleInformation to retrieve the Address, SizeOfImage and EntryPoint of the module.

Translating this to OSX I found this and other sources which helped me implement this function. I've been able to use the dyld_image_info struct to get the name of the module and the loaded address but how do I go about getting the SizeOfImage and EntryPoint values?

Community
  • 1
  • 1
Dave
  • 7,283
  • 12
  • 55
  • 101
  • The information you need should be available in the object file. `const struct mach_header* imageLoadAddress` is a pointer to the memory location where the image has been loaded. The format of dylibs (Mach-O format) is well documented [here](https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachORuntime/index.html#//apple_ref/c/tag/dylib) – Henrik Mar 06 '15 at 19:35

1 Answers1

3

SizeOfImage and EntryPoint are the field names in the Windows MODULEINFO structure. Naturally, the don't exist in the OS X context.

The dynamic libraries used by an OS X task are object files of the Mach-O format, which has the following basic structure:

Mach-O file format basic structure

(From Apple: Mach-O File Format Reference)

I will assume that the SizeOfImage value you are after is the number of bytes the whole object file consumes as currently loaded into memory. The way to do this is to sum the size of the Header, Load Commands and the data Segments. Something along the lines of:

size_t size_of_image(struct mach_header *header) {
    size_t sz = sizeof(*header); // Size of the header
    sz += header->sizeofcmds;    // Size of the load commands

    struct load_command *lc = (struct load_command *) (header + 1);
    for (uint32_t i = 0; i < header->ncmds; i++) {
        if (lc->cmd == LC_SEGMENT) {
            sz += ((struct segment_command *) lc)->vmsize; // Size of segments
        }
        lc = (struct load_command *) ((char *) lc + lc->cmdsize);
    }
    return sz;
}

Next, the entry point is a little different. My guess is that you want the address of the initializer function of the dynamic library (ref here). This is found in the __mod_init_func section of the __DATA segment. To retrieve this section, we can use getsectbynamefromheader. This function returns a pointer to a "struct section", which contains a pointer to the virtual memory location of the section.

#include <mach-o/getsect.h>

uint32_t mod_init_addr(struct mach_header *header) {
    struct section *sec;
    if (sec = getsectbynamefromheader(header, "__DATA", "__mod_init_func")) {
        return sec->addr;
    }
    return 0;
}

The returned value is the virtual memory address of the __mod_init_func section, which contains "pointers to module initialization functions".

NOTE: These structures and functions have analogous 64-bit implementations, suffixed by _64, such as struct mach_header_64, getsectbynamefromheader_64 etc. For 64-bit objects, these functions must be employed instead.

Disclaimer: All code untested - coded in browser

Henrik
  • 4,254
  • 15
  • 28
  • Excellent answer. I've implemented the SizeOfImage functionality but the correct value does not include the size of the header or load commands but rather only the size of the data (segment.vmsize). I did run into a problem though. While dylib files are okay, executables are inaccurate. For instance, the size returned by this function for one of my test executables is 8192 bytes off. Any ideas why that is? I'm thinking it has something to do with alignment. – Dave Mar 09 '15 at 07:18
  • Off compared to what? Filesize on disk? Apple's documentation states this about segment_command.filesize: "... For segments that require more memory at runtime than they do at build time, vmsize can be larger than filesize. For example, the __PAGEZERO segment generated by the linker for MH_EXECUTABLE files has a vmsize of 0x1000 but a filesize of 0. Because __PAGEZERO contains no data, there is no need for it to occupy any space until runtime. Also, the static linker often allocates uninitialized data at the end of the __DATA segment; in this case, the vmsize is larger than the filesize." – Henrik Mar 09 '15 at 07:24
  • This is of course why a `segment_command` has two different size specifiers, because they may actually be different. Since you're analyzing memory usage, `vmsize` is the only way to go. – Henrik Mar 09 '15 at 07:46
  • No no, not the filesize on disk, just looking at the memory through a remote debugger. Typically you see modules in memory following one another. So an executable could start at 80000 with a size of 37E000 and end at 3FE000 where the next module begins. But here, the size is sometimes either too big or a little small. One of the things I also eliminated is __PAGEZERO and __LINKEDIT. I'm thinking pretty much only __DATA and __TEXT needs to be counted as part of the size. – Dave Mar 09 '15 at 23:42
  • Okay so I did some more research and figured out that the size is correct after all. Using vmmap I mapped out the regions of the application. From there I was able to verify that the sizes are correct, the discrepancy comes from dead space in virtual memory. So far I only had to eliminate __PAGEZERO. Everything else appears to be correct. I'll post my code in the future with any other useful data. Thanks again, hopefully this helps someone in the future. – Dave May 04 '15 at 21:16