1

I have decompiled an .so file (from an ARM lib in an Android app) using retdec and among the code I could find instructions like this:

int32_t a = `some value`;
int32_t b = `another value`;
*(int32_t *)(a + 4) = b;

Due to the fact that running this with any value results in a warning when compiling and segmentation fault when running, I'm not sure what it really does.

Dorinel Panaite
  • 492
  • 1
  • 6
  • 15
  • What is the type of "a"? – Vlad from Moscow Aug 16 '19 at 14:07
  • Looks smelly indeed. Can you post a github link to the file where you spotted this? Or better yet post a complete example including the variable declaration. – Lundin Aug 16 '19 at 14:08
  • Both a and b are int32_t, I'll edit the original post. – Dorinel Panaite Aug 16 '19 at 14:08
  • 1
    It means "take the result of `a + 4`, treat that as a pointer to an `int32_t`, dereference that pointer and assign the value of `b` to the result. Based on your edit, it's treating the integer value stored in `a` as an address, adding 4 to that address, and assigning the value of `b` to the object at that resulting address, which ... huh? No wonder it blows up. – John Bode Aug 16 '19 at 14:10
  • @Lundin I am unable to post the whole file, but I'll edit this post with more information is needed. What is it that you want to see in the code? – Dorinel Panaite Aug 16 '19 at 14:11
  • 1
    @Dorinel Panaite It is a bad code. Pointers can have a size equal to 8. – Vlad from Moscow Aug 16 '19 at 14:11
  • @JohnBode I thought it was something like this. But both clang and gcc throw warnings at me and the code crashes when ran, no matter the values. Maybe it's a decompilation error or something. – Dorinel Panaite Aug 16 '19 at 14:12
  • 6
    @DorinelPanaite: My favorite description of decompilation is "turning hamburger back into cows". I wouldn't be surprised if this was a bad translation into C. – John Bode Aug 16 '19 at 14:16
  • 1
    @Lundin: Read the first sentence of the question. It is not code in GitHub or a library. It is the output of a decompiler. – Eric Postpischil Aug 16 '19 at 14:18
  • 2
    @VladfromMoscow: Read the first sentence of the question. The code is output from a decompiler. Among other things, that means it is not intended to be portable code; it is specifically tied to one platform and is a representation of what some machine code does. Whether it is “good” or “bad” for code written by a human is irrelevant; that is simply not what this code is for. This code is for showing what the machine code does, and it is doing that. – Eric Postpischil Aug 16 '19 at 14:24

5 Answers5

6

Working from the inside out:

a + 4

Takes the value of a, and adds 4 to it, following the usual arithmetic conversions if applicable. This expression has at least the rank of int32_t.

Next:

(int32_t *)(a + 4)

Means that you take this new integer value, and interpret it as a pointer to an int32_t. This expression has type int32_t *.

One step further out, you're dereferencing it with the * operator:

*(int32_t *)(a + 4)

This gives an lvalue (like a typical variable) of type int32_t at the address a + 4 (The validity of such an address would be implementation-dependant).

Finally, you assign the value in b to this location:

*(int32_t *)(a + 4) = b;

All together, this means that you store the value of the int32_t b, taken as an int32_t, into the memory location 4 past the value of a.

Unless a + 4 happens to point to a valid memory location to store an int32_t (as it presumably would have in its original context), this will likely result in the program misbehaving. At best, the behaviour is implementation-defined. At worst, it's undefined.

Thomas Jager
  • 4,836
  • 2
  • 16
  • 30
4

The problem is that the decompiler cannot know the types of variables. It just can know that there is some stuff in registers and some stuff on stack of certain size and it is used in a certain way, so it figures that all 32-bit entities are int32_t even though they could be pointers too on ARM. Or even zero-extended chars moved around in registers.

In this case a seems not to be an integer, but a pointer to an element in an array, or perhaps a pointer to a structure and the code was something like

int *a = something;
int b = calculate_something();
a[1] = b;

Or perhaps

struct foo *a = something;
int b = calculate_something();
a->second_member = b;

We wouldn't know. So the best the decompiler can come up with is

int32_t a = something;
int32_t b = calculate_something();
*(int32_t *)(a + 4) = b;

i.e. "oops, the value in a + sizeof (int) now should be used as a pointer, and b be assigned to that location."


As for compiling it again - don't even dream compiling it for any other platform than the code originated from.

2

It means that de-compilation of machine code does not yield the original source code back! Let's take, for example, the code snippet below.

int a[5];
int b;
void somefunc(void)
{
    a[1] = b;
}

It compiles to something like this:

somefunc:
        ldr     r2, =b       # Load the address of b
        ldr     r3, =a       # Load the address of a
        ldr     r2, [r2]     # Load the value in b 
        str     r2, [r3, #4] # Store value in b to a[1] or *(a + 4)
        bx      lr           # return

Now, if someone were to try to de-compile it line by line into C code, without knowing about the array and any other context, it would turn out something like the snippet you posted.

str     r2, [r3, 4]  => *((int32_t *)r3 + 4) = r2

There are probably also many other snippets of C code that could compile to the exact same assembly sequence. Which is why decompiling is far from an 'exact science'!

th33lf
  • 2,177
  • 11
  • 15
1
*(int32_t *)(a + 4) = b;

In simple terms, this means get the value of a+4 and treat it as an address at which a variable of type int32_t resides. At that address store the value of b.

Decompiling can't always produce the exact result, because a code like this is supposed to crash unless you have reserved memory location at a+4 for a int32_t.

Also, I assume this is because that .so is a decompiled version of code written specifically for a 32 bit architecture which is why it says type int32_t. Making a guess, it "may" work if you supply gcc with -m32 flag, which asks it to compile the code for 32 bit architecture.

Mihir Luthra
  • 6,059
  • 3
  • 14
  • 39
1

The ARM cpu is a load-store architecture. It has a form of store as follows,

str rN, [rP, #4]

This will take the value of register rP (a pointer) and add four to it. The BUS will issue a store to memory with the value in register rN. You decompiler is seems rudimentaryNote below and has translated this as,

int32_t a = `some value`;      /* sets up pointer register `rP` */
int32_t b = `another value`;   /* Initializes value `rN` */
*(int32_t *)(a + 4) = b;       /* the instruction `str rN, [rP, #4]` */

If you look at the wiki it notes that compiling to binary looses information. A goal of the decompiler will be that if you compile the result unaltered, it should give the same binary.

As the code is trying to replicate a machine language identical, there is no way the code will ever be portable.


Part of the issue with the tool is,

I have decompiled an .so file (from an ARM lib in an Android app)

Shared libraries are compiled to generate some strange code to allow them to be used by multiple users. It is possible that the registers used are non-standard which doesn't allow the decompiler to match the EABI regular register use as found in a main executable.

I looked briefly and the tool didn't seem to have a '-shared-library' decompile option. I suspect you are decompiling a thunk of some sort. Ie, a plt or got; see ARM Dynamic linking. Here is a question on shared library for the ARM; if the decompiler had a -shared-library option, it would probably need an OS (and version) qualifier.

artless noise
  • 21,212
  • 6
  • 68
  • 105
  • Not quite the same binary but that it should compile to code that should behave the same way – Antti Haapala -- Слава Україні Aug 16 '19 at 16:04
  • I don't think the developers would be upset if it was the same binary :) But most likely it will be just very similar as you suggest. Indeed the compiler (code/versions) and flags will affect this. It should be possible to produce binary equivalents if you modelled on RTL/etc of a compiler. Ie, you know the exact compiler and give specific flags for the user. Some code will checksum itself or portions of code. It would at least be useful to produce the same binary. Probably most real decompiler can't do this. – artless noise Aug 16 '19 at 16:12