4

I have a binary object that was generated on an SGI 64bit machine using a MIPSpro compiler. I am trying to read this binary object on a 64bit x86_64 machine running RHEL 6.7. The structure of the object is something like like

class A {
  public:
    A(){
      a_ = 1;
    }
    A(int a){
      a_ = a;
    }
    virtual ~A();
  protected:
    int a_;
};
class B : public A {
  public:
   // Constructors, methods, etc
    B(double b, int a){ 
      b_ = b;
      a_ = a;
    }
    virtual ~B();
  private:
    double b_;
};
A::~A(){}
B::~B(){}

After reading the binary file, a swapping the bytes (due to the endianness) I find that b is correct but a is misaligned, indicating the data is misaligned with my current build.

I have two question about this. Firstly, how does the MIPS Pro compiler align its fields and how is that different to the way gcc does it. I am interested in the case of inherited classes. Secondly, is there an option in gcc or C++ that can force the alignment to be the same as the way MIPS does it?

Update 1: For additional clarification, the code was compiled on MIPS ABI n64. I have access to the original C++ source code but I can't change it on the MIPS machine. I am constrained to reading the binary on x86_64.

Update 2: I ran sizeof commands before and after adding a virtual destructor to both my classes on both machines.
On MIPS and x86_64, the output before the virtual directive was

size of class A: 4
size of class B: 16

After adding the virtual method, on SGI MIPS the output is

size of class A: 8
size of class B: 16

and on x86-64 Linux:

size of class A: 16
size of class B: 24

Looks like the way virtual method (or is it just methods in general?) is processed on these machines is different. Any ideas why or how to get around this problem?

Iliketoproveit
  • 445
  • 6
  • 15
  • I believe the MIPS ABI is n64. I'm not able to change the code running in the MIPS machine so I am constrained to reading the binary on the x86_64 machine. – Iliketoproveit Mar 15 '18 at 17:05
  • 3
    There is a chicken-and-egg problem here. You cannot swap bytes until you know how it is aligned. So non-zero odds that it is actually the byte swapping that makes it looks misaligned in an intractable way. – Hans Passant Mar 15 '18 at 17:15
  • @HansPassant How so? The byte-swap is just based on the size of the type a certain field is. It has nothing to do with the alignment or padding. However, even without the swaps, the fields are being misread – Iliketoproveit Mar 15 '18 at 17:20
  • 1
    Is `sizeof(A)` on the x86_64 machine the same as the size of your struct from the MIPS machine? And is this also the case for `B`? Do either structs contain `virtual` member functions? – Candy Gumdrop Mar 15 '18 at 17:25
  • 3
    If you want to send a data structure from one machine to another you should use a real form of serialisation (Protocol Buffers, Messagepack, etc) where all this sort of thing is taken care of for you, rather than reading bytes directly from memory and praying that it works. – Sean Burton Mar 15 '18 at 17:33
  • 3
    Structures can get padded, so it's best to read and write using a proper serialization method. – tadman Mar 15 '18 at 17:36
  • @SeanBurton I am not able to change the code on the MIPS side. So the machine will output the binary in a MIPS format, its my job to try to read that into a x86_64 machine – Iliketoproveit Mar 15 '18 at 17:38
  • @CandyGumdrop Updated the question, both of them have a virtual destructor in the cpp file. I have to run some test to check the size of them in both machine and get back to you – Iliketoproveit Mar 15 '18 at 17:44
  • very bad idea to try to use structures, etc like this across compile domains even with the same target, much less different targets. not the right way to move/share data. – old_timer Mar 15 '18 at 17:44
  • @old_timer As I've said before, the code is already deployed so I have no control of how the binaries are structured or written. I am trying to read the binary generated by MIPs to x86_64 – Iliketoproveit Mar 15 '18 at 17:48
  • 2
    dont use such a structure then on the mips side, if already deployed you should know the exact layout of the data... – old_timer Mar 15 '18 at 18:56
  • So you have a binary-stream and you try to read that back in into x64, but you dont know the layout? – nonsensation Mar 15 '18 at 19:37
  • I know the layout in the sense that I have the source code for the class. But the way the MIPSpro compiler decides to format that in memory is a bit harder to figure out – Iliketoproveit Mar 15 '18 at 20:22
  • 2
    Your MIPS build is using 32-bit pointers. You'd need to add an `int32_t dummy` member to the start of `A` in the MIPS version only, to account for the vtable pointer only being 4 bytes. (and somehow give the vtable pointer 8-byte alignment...). But seriously it makes very little sense to serialize data that includes a vtable pointer. It makes no sense to read that back in on another machine where the pointer won't be valid. (related: [How do objects work in x86 at the assembly level?](https://stackoverflow.com/questions/33556511/how-do-objects-work-in-x86-at-the-assembly-level)) – Peter Cordes Mar 15 '18 at 20:23
  • Maybe the 010 Editor is a tool you can use for that – nonsensation Mar 15 '18 at 20:24
  • Can you use a 64-bit MIPS build? – Peter Cordes Mar 15 '18 at 20:26
  • @PeterCordes I'm pretty sure the MIPS build is 64 bit. When I run `sizeof(void*)` in the MIPS machine I get 4 – Iliketoproveit Mar 15 '18 at 20:29
  • @PeterCordes I'm trying a similar approach but in the x86_64 machine. I added an `int32_t dummy` to the end of `A`. Luckily the code defines the virtual destructor as `A::~A(){}` so I think removing it shouldn't affect anything – Iliketoproveit Mar 15 '18 at 20:31
  • 3
    @Iliketoproveit You are probably using the n32 ABI (64-bit MIPS with 32-bit pointers, kind of like the x86_64 x32 ABI) – Candy Gumdrop Mar 15 '18 at 20:32
  • 1
    Right, so you have 32-bit pointers. Can you build the MIPS version for an ABI that uses 64-bit pointers? That will likely give you the same layout as x86-64 for most structs, if `long` is the same size in both in the MIPS ABI with 64-bit pointers. (`long double` will almost certainly be different, but you probably don't use it.) – Peter Cordes Mar 15 '18 at 20:32
  • 2
    You need to add the dummy member to the *start* of `A` (for MIPS only), to pad before the `int` member so `offsetof(A, a_)` is 8 for both MIPS and x86-64. And you'd need `__attribute((aligned(8)))` on the MIPS struct so it's padded out to 16 like x86-64. This is just so insane. Store your data without a `vtable` pointer. Definitely remove the virtual member if you can. You could also build for the Linux x32 ABI on x86-64. (32-bit pointers in 64-bit mode). – Peter Cordes Mar 15 '18 at 20:36
  • 1
    Or you could write a struct for the x86-64 version with compatible layout to MIPS, instead of trying to use the *same* C++ type on different ABIs. – Peter Cordes Mar 15 '18 at 20:46
  • I don't understand what you mean by `I have access to the original C++ source code but I can't change it on the MIPS machine. I am constrained to reading the binary on x86_64.`. If you're not allowed to change the code how can you modify the code and run like above? And why do you need to read the binary? – phuclv Mar 16 '18 at 02:01
  • @LưuVĩnhPhúc the MIPS machine is deployed and generating binaries, our job is to use a Linux machine to read these binaries. The easy solution is to install new software on the deployed machine but we are unable to do this – Iliketoproveit Mar 16 '18 at 16:42
  • generating binaries is the job of compilers. If you want to do that on x86_64 just use a cross-compiler. Then you'll have the binary to "read" immediately, if you mean "read" that way. If you want to "read" as reverse engineering from the binary output then it's much simpler just to print `sizeof` and addresses of the needed elements – phuclv Mar 17 '18 at 01:46
  • 1
    I think I get the idea now. The code on MIPS side is fixed and won't change anymore. The compiled binary will output a binary stream of those objects and your job is to read that stream on x86_64, right? – phuclv Mar 17 '18 at 01:54
  • @LưuVĩnhPhúc correct. – Iliketoproveit Mar 20 '18 at 12:21

2 Answers2

7

Hoping to make the binary layouts of the two structures match with inheritance and having virtual methods and across different endianness looks to me like a lost cause (and I don't even know how you managed to make fwrite/fread serialization work even on the same architecture - overwriting the vtable address is a recipe for disaster - even on "normal" architectures nothing guarantees you that they'll be located in the same address even across multiple runs of the exact same binary).

Now, if this serialization format is already written in stone and you have to deal with it, I'd avoid completely the "match the binary layout" way; you are going to get mad and get a terribly fragile result.

Instead, first find out the exact binary layout of the source data once for all; you can do it easily using offsetof over all members on the MIPS machine, or even just by printing the address of each member and computing the relevant differences.

Now that you have the binary layout, write some architecture-independent deserialization code. Let's say you found out that you found out that A is made of:

  • 0x00: vptr (8 bytes);
  • 0x08: a_ (4 bytes);
  • 0x0c: (padding) (4 bytes)

and B is made of:

  • 0x00: vptr (8 bytes);
  • 0x08: A::a_ (4 bytes);
  • 0x0c: (padding) (4 bytes);
  • 0x10: b_ (8 bytes).

then you'll write out code that deserializes manually each of these fields in a given structure. For example:

typedef unsigned char byte;

uint32_t read_u32_be(const byte *buf) {
    return uint32_t(buf[0])<<24 |
           uint32_t(buf[1])<<16 |
           uint32_t(buf[2])<<8  |
           uint32_t(buf[3]);
}

int32_t read_i32_be(const byte *buf) {
    // assume 2's complement in unsigned -> signed conversion
    return read_u32_be(buf);
}

double read_f64_be(const byte *buf) {
    static_assert(sizeof(double)==8);
    double ret;
    std::reverse_copy(buf, buf+8, (byte*)&ret);
    return ret;
}

void read_A(const byte *buf, A& t) {
    t.a_ = read_i32_be(buf+8);
}

void read_B(const uint8_t *buf, B& t) {
    read_A(buf, t);
    t.b_ = read_f64_be(buf+0x10);
}

Notice that this isn't wasted effort, as you'll soon need this code even for the MIPS version if you happen to change compiler, compilation settings or anything else that may affect the binary layout of your classes.

BTW, the generation of this code can potentially be automated, as it's all data that is available in the debug information; so, if you have many structures in this criminal serialization format you can semi-automatically generate the deserialization code (and move them to something saner for the future).

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
1

Reverse engineering:

Write some different objects instances into several files.
With a hexadecimal editor look for the differences.
At least you get the positions in the binary file for each value.

Finally, handle endianess.

Ripi2
  • 7,031
  • 1
  • 17
  • 33
  • This would be the last ditch attempt. There are many more classes with this structure so this solution wouldn't scale. I'm interested in first understanding why the alignment is the way it is and trying to develop a solution based on that. – Iliketoproveit Mar 15 '18 at 20:35
  • 1
    This *last ditch* will teach you the aligment, which lead you to a general solution. IOW, you have an unique version of the file to read, because this file is written by an unique {compiler, parameters} set. An *all-purpose* way would need a lot of paths. – Ripi2 Mar 15 '18 at 20:36