2

I'm trying to read the PE headers of a file to get some information. For .NET and C#, I'm using BitConverter to convert the Byte array obtained after having read the file to an integer equivalent. I wish to do the same with C++, but am not sure of the best approach. I'm using an unsigned char array as the Byte array equivalent.

The code is given below..

uint16_t GetAppCompiledMachineType(string fileName)
{
    const int ptr_offset = 4096;            
    const int mac_offset = 4;
     char *data = new char[4096];
    fstream f;
    f.open(fileName, ios::in | ios::binary  );
    f.read(data, 4096);


    int32_t pe_addr= *reinterpret_cast<int32_t*>(data, ptr_offset);
    uint16_t machineUint = *reinterpret_cast<std::uint16_t*>(data, pe_addr + mac_offset);
    return machineUint;

 }
int _tmain(int argc, _TCHAR* argv[])
{

      string fileName = "<some_path>\\depends.exe";
      uint16_t tempInt = GetAppCompiledMachineType(fileName);
      cout<<tempInt;
      std::getchar();

    return 0;
}

I'll be using the O/P to query the PE header for information. Need the equivalent of BitCOnverter here. and hopefully it will work.

UPDATE : Thanks for the replies. As suggested I'm trying to use the cast, to convert the character array into Int, to read the PE Header, but it's giving me an access violation Unhandled exception. This is the full code, the file is valid and is being read. I tried with debug, and Optimization disabled, but to no avail.

Kindly advise.

Thanks a lot.

user1173240
  • 1,455
  • 2
  • 23
  • 50
  • 1
    Why don't you directly read from the stream instead? `f >> head_addr` – user703016 Oct 08 '14 at 11:00
  • Do you really want to use native endian? When parsing files it's almost never correct. You should implement your own bitshift based converter with fixed endianness. – CodesInChaos Oct 08 '14 at 13:38
  • How to do that? Might be a novice question, but I haven't a lot of experience with it. – user1173240 Oct 09 '14 at 06:32
  • @user1173240 `ptr_offset` you're not using correct value, also there is a typo in the offset calculation (`,` instead of `+`). I updated my answer with exact code. – Adriano Repetti Oct 09 '14 at 07:24

3 Answers3

3

You have a byte array pointer (char* data) then simply move pointer to offset you need data + PE_POINTER_OFFSET, cast to pointer to integer (int*)(data + PE_POINTER_OFFSET) and deference the pointer to get value:

int32_t head_addr = *reinterpret_cast<int32_t*>(data + PE_POINTER_OFFSET);
uint16_t machineUint = *reinterpret_cast<uint16_t*>(data + head_addr + macoffset);

EDIT 1: you're trying to read a PE so I may safely assume your environment is Windows. Both x86 and x64 supports unaligned memory access (of course you'll pay a price in performance for this but probably nothing you will ever note and you'll save memcpys).

Itanimum (if you have to support it) and (very old) ARM may be a problem: for first one just use __unaligned for your char array and for second one (if you don't let compiler do the job for you) you can use __packed.

Note also that this assumptions (plus endianness) are valid because you're working with PE files on Windows environment, if you had to write portable code or to read something else then this is not the right way to do it (in short you have to address single bytes and to copy them using a fixed order).

EDIT 2: according to updated code you're using problem is with *reinterpret_cast<int32_t*>(data, ptr_offset), note that you don't sum a pointer with an offset and also offset is invalid (it should be 60 - if I'm not wrong). What you're doing there is reading from absolute location with address 4096 and it'll cause an access violation. In code:

uint16_t GetAppCompiledMachineType(string fileName)
{
    const int32_t PE_POINTER_OFFSET = 60;            
    const int32_t MACHINE_OFFSET = 4;

    char data[4096];

    fstream f;
    f.open(fileName, ios::in | ios::binary);
    f.read(data, sizeof(data));

    int32_t pe_header_offset = *reinterpret_cast<int32_t*>(
        data + PE_POINTER_OFFSET);

    // assert(pe_header_offset + MACHINE_OFFSET < sizeof(data));

    return *reinterpret_cast<std::uint16_t*>(
        data + pe_header_offset + MACHINE_OFFSET);
}

This code is still far to be production quality but note few changes:

  • Buffer data isn't dynamically allocated then you don't need to free that memory (you were not freeing allocated memory, Windows will free it for you when process exits but if you call that function many times you'll consume memory).
  • With statically allocated array you can use sizeof() to determine buffer size (as input for read()).
  • PE_POINTER_OFFSET has now correct value (60 instead of 4096).
  • Offset from data is now calculated correctly (as sum of data with PE_POINTER_OFFSET).

All these said we're still using a buffered approach but it's pretty useless here because fstream will manage that for us. Let's simplify our code (with the side effect to also make it more robust, we're not assuming PE header fits our 4K buffer).

uint16_t GetAppCompiledMachineType(string fileName)
{
    const int32_t PE_POINTER_OFFSET = 60;            
    const int32_t MACHINE_OFFSET = 4;

    fstream f(fileName, ios::in | ios::binary);

    int32_t pe_header_offset:
    f.seekg(PE_POINTER_OFFSET); f >> pe_header_offset;

    uint16_t machineType;
    f.seekg(pe_header_offset + MACHINE_OFFSET); f >> machineType;

    return machineType;
}

Now it works without casts and conversions (but still assuming PE and machine endianness match).

Adriano Repetti
  • 65,416
  • 20
  • 137
  • 208
  • As Richard noted, this can cause problems on platforms where you can't access unaligned integers. – CodesInChaos Oct 08 '14 at 11:29
  • @CodesInChaos He's reading a PE so I'm freely assuming his target OS is Windows. Both x86 and x64 supports unaligned access (performance here aren't - I guess - an issue). Itanimum and ARM may be a problem but in that case there is `__unaligned` – Adriano Repetti Oct 08 '14 at 11:42
  • Yes, it is Windows. I missed that part out. This, though, gives me an Unhandled exception for Access Violation. The fileName is fine, and the file is getting read properly according to above code. But the casting fails. – user1173240 Oct 08 '14 at 12:09
  • @user1173240 which cast? – Adriano Repetti Oct 08 '14 at 12:13
  • The first one. `int32_t head_addr = *reinterpret_cast(data + PE_POINTER_OFFSET);`. This gives an access Violation exception. – user1173240 Oct 08 '14 at 12:16
  • @user1173240 just tried and it works (comparing with `dumpbin`). Try to log what you're getting there (especially if you're inspecting `argv[0]` or a file in use with `ios::out`...just remove it). – Adriano Repetti Oct 08 '14 at 12:33
  • Just tried with `wmplayer.exe`, in `ios::in`, it gets read, `data` char array indicates it has `MZ` after the read statement `f.read( data, 4096)`, but when it gets to the casting to `int32_t`, it gives an access violation. – user1173240 Oct 08 '14 at 13:02
  • No idea how to tell you, but now the second casting fails. Access violation...the first one gives a `pe_addr` value of `248`, which should be accurate? but when I try the second cast, to `uint16_t`, it fails. ( Also I checked, I'm not using the `,` in the second, just to be sure of not making a silly one). – user1173240 Oct 09 '14 at 09:17
  • Nope. Sorry, it works. Passed the wrong filename when doing some extended testing. Thanks a lot for your help. I'll rework the code to make it `endian` safe. Would you have an idea on where to access this information on making such `bit-and-byte` data access safe? Thanks again. – user1173240 Oct 09 '14 at 09:23
  • Sorry, my snippet was also wrong (missing data offset in 2nd calculation). I don't think you have to make it endianness safe unless you're thinking to inspect a file created for another endianness. In that case you just need a [SwapEndianness()](http://stackoverflow.com/questions/2182002/convert-big-endian-to-little-endian-in-c-without-using-provided-func) function applied after each "bit conversion" (cast). – Adriano Repetti Oct 09 '14 at 09:48
  • mm...the second snippet gives a different output, as compared to the first where casting was used. Though from a visual it seems the same, but the` header_offet` `outstream` gives a different value. – user1173240 Oct 10 '14 at 08:58
  • @user1173240 strange, if you inspect please share your findings! – Adriano Repetti Oct 10 '14 at 09:02
3

refactoring to eliminate data alignment problems on some architectures:

template<class T>
T from_buffer(uint8_t* buffer, size_t offset)
{
  T t_buf = 0;
  memcpy(&t_buf, buffer + offset, sizeof(T));
  return t_buf;
}

...

int32_t head_addr = from_buffer<in32_t>(buffer, PE_POINTER_OFFSET);
uint16_t machineUint = from_buffer<uint16_t>(buffer, size_t(head_addr + macoffset));
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
2

Best approach would be declare the structure of the PE File Format. Example:

struct dos_header {
     char signature[2] = "MZ";
     boost::int16_t lastsize;
     ..
     boost::int16_t reserved2[10];
     boost::int32_t e_lfanew;
}

Notes:

  • It's better to use cross platform integer of expected size (int32_t instead of long).
  • Be carefull with structure alignment (use #pragma pack(8) if have problem with alignment).
  • This structures are declared in windows.h but for cross platform development, I recommend declare your separately and portable.
  • In 64bits architectures, some of the structures have changes.

When you have the mapped structures, you can cast the buffer as pointer to the structure and access the members.

Sample:

if (offset + sizeof(dos_header) > size_data) {
    // handle the error
    // exit
} 
const dos_header* dh = static_cast<const dos_header*>(data + offset);
std::cout << dh->e_lfanew << std::endl;
NetVipeC
  • 4,402
  • 1
  • 17
  • 19