1

I am looking for a way to scan a program's memory for specific pattern. The program is loading our code as a library (.so).

Here is my attempt:

unsigned long FindPattern(char *pattern, char *mask)
{
    void *address;
    unsigned long size, i;      

    // NULL = We want the base address of the process we are loaded in
    address = dlopen(NULL, 0); // Would be GetModuleHandle(NULL) on Windows

    // The size of the program, would be GetModuleInformation.SizeOfImage on Windows
    size = 0x128000; // Didn't find a way for Linux

    for(i = 0; i < size; i++)
    {
         if(_compare((unsigned char *)(address + i), (unsigned char *)pattern, mask))
               return (unsigned long)(address + i);
    }
    return 0;         
}

int _compare(unsigned char *data, unsigned char *pattern, char *mask)
{
    for(; *mask; ++mask, ++data, ++pattern)
    {
        if(*mask == 'x' && *data != *pattern) // Crashes here according to gdb
            return 0;
    }
    return (*mask) == 0;
}

But all of this doesn't work. Starting at dlopen, it does not return the correct base address of the program we are loaded in. I have also tried link_map as explained here. I do know the addresses from IDA and gdb that's why I know dlopen returns wrong values.

Using gcc-4.4.7 on CentOS 6.5 64bit. The program is a 32bit executable binary.

Community
  • 1
  • 1
SuperUser
  • 331
  • 2
  • 6
  • 21
  • [The man page of `dlopen`](http://manpages.debian.org/cgi-bin/man.cgi?query=dlopen&sektion=3&apropos=0&manpath=Debian+7.0+wheezy) says: “One of the following two values must be included in flag: […]” and you pass `0` as the second argument to `dlopen`. Hence you can't rely on the return value. Maybe that's your problem, but I'm not sure… – mafso Jun 21 '14 at 01:00
  • Unfortunately not, I've tried LAZY,NOLOAD and NOW flag; all of them failed – SuperUser Jun 21 '14 at 01:05
  • Are you hoping to scan the program's code or data segments? Or both? Every bit of memory controlled by the program? – Multimedia Mike Jun 21 '14 at 01:49
  • Actually just the program's code; the assembly instructions – SuperUser Jun 21 '14 at 01:55

1 Answers1

1

dlopen returns a HANDLE for the library, not a pointer to the memory containing the library.

You need to use dlsym to get an address of a function.

handle = dlopen(NULL, RTLD_LAZY);

address = dlsym(handle, "main");

NOW you'll have an address to peek at.

"main" may not be the best place to start, but it works as a demonstration here. Be sure to find a symbol located early in the program to allow full searching.

And as a bonus, speed up your search/compare loop:

// The size of the program, would be GetModuleInformation.SizeOfImage on Windows
size = 0x128000; // Didn't find a way for Linux

unsigned char* ptr = address;

while (1)
{

  /* hmmm, gets complicated if we need to mask src char then compare pattern, I punted
   * and just compared for first char of pattern. It's just an idea... */

  ptr = memcmp(ptr, pattern[0], (size - ptr + address));

  if (ptr==NULL)
    break;

  if (_compare(ptr, (unsigned char *)pattern, mask))
           return ptr;
}
lornix
  • 1,946
  • 17
  • 14
  • Umm, the length argument for the memchr function should probably be `(size - ptr + address - strlen(pattern))`, so we don't `_compare` off the end of the buffer. (better to put strlen(pattern) into a variable to prevent re-evaluating strlen every time too) – lornix Jun 21 '14 at 02:26
  • Based on the additional `_compare_` code shown, my suggested speedup will work properly as long as the FIRST char of `mask` is `'x'`. Hooray! – lornix Jun 21 '14 at 03:06
  • Thanks for the suggestion. I've tried it with dlsym and all different kind of entry points but literally nothing works; it always returns 0. When using `readelf -l`, the output `Entry point` shows a correct address which is pretty much at the very top of the program. So there must be a way to get this address. – SuperUser Jun 21 '14 at 12:14
  • The value of 'entry_point' IS stored in the first bytes of the memory image, but due to ASLR, we don't know WHERE that is. Without ASLR, 32bit ELF's start at 0x8048000, while 64bit ELF's begin at 0x400000, and `Entry Point` is at offset 0x18 from that as a 4byte LE value. Generally, if you look up the value of `_start`, you'll be __very__ close to the beginning of the main program. – lornix Jun 21 '14 at 19:04
  • But `dlsym` returns 0 for `_start`, I did it exactly like in your post. Any idea? For the meanwhile I'll be using the default ELF start, thanks. – SuperUser Jun 22 '14 at 11:37