Why does gcc not optimize the immediate?

Question

Regardless of the optimization level, a pointer loaded with an immediate value at the assembly level is loaded a slightly different value, being corrected using a fixed offset when dereferenced.

E.g

int test(){
    int *tmp = (int*)0x30000004;
    tmp[0]++;
    return tmp[1];
}

Compiled with mips-linux-gnu-gcc-8 -O3 -nostdlib test.c -c generates the following assembly:

Disassembly of section .text:

00000000 <test>:
   0:   3c033000        lui     v1,0x3000
   4:   8c640004        lw      a0,4(v1)
   8:   8c620008        lw      v0,8(v1)
   c:   24840001        addiu   a0,a0,1
  10:   03e00008        jr      ra
  14:   ac640004        sw      a0,4(v1)

As you can see the $v1 register is loaded with 0x30000000 and then the offsets are all shifted by +4.

Why does GCC do this?

Is it possible to deactivate?

The end goal would be to get something like this:

Disassembly of section .text:

00000000 <test>:
   li     v1,0x30000004
   lw      a0,0(v1)
   lw      v0,4(v1)
   addiu   a0,a0,1
   jr      ra
   sw      a0,0(v1)

You cannot load an arbitrary 32-bit number in only 1 instruction. `lui` accepts a 16 bit immediate. You can load an arbitrary 32-bit number issuing a `lui` to load the upper half and and then an `ori` to load the lower part. The code you showed uses the immediate field of `lw` to offset to the desired address — gusbro, Oct 01 '21 at 00:54
a MIPS instruction is 32-bit long, so how can you squeeze a 32-bit immediate into it while still leaving room for the opcode and register number? [How to load 32 bit constant to a register without using LUI](https://stackoverflow.com/q/67909301/995714), [Load 32-bit constant to register in MIPS](https://stackoverflow.com/q/13160930/995714) — phuclv, Oct 01 '21 at 04:15
Does this answer your question? [How to load 32 bit constant to a register without using LUI](https://stackoverflow.com/questions/67909301/how-to-load-32-bit-constant-to-a-register-without-using-lui) — phuclv, Oct 01 '21 at 04:16
[How 32 bit IR hold load instruction? (RISC style 32bit architechture)](https://stackoverflow.com/q/33830520/995714) — phuclv, Oct 01 '21 at 07:18
@gusbro @phuclv You're right we can't do it in a single instruction. I've replaced the 'end goal' with the correct pseudo-instruction `li` which expands to `lui`/`ori` combo. My question is more related to why does gcc prefer to use the immediate field of `lw`, older versions didn't do this. — krystalgamer, Oct 01 '21 at 08:38
GCC's code is a total of 6 instructions, whereas yours is 7 (since as you say `li` is really two). So GCC wins at least on code size, and maybe on speed too, which explains why it prefers its version. Probably in older versions of GCC, this optimization just hadn't been implemented yet. — Nate Eldredge, Oct 01 '21 at 15:00
This seems like an XY problem: why do you think it desirable to make GCC produce worse code? Are you post-processing its assembly output for some purpose? If you tell us what that purpose is, there might be a better way to accomplish it than by deliberately de-optimizing your code. — Nate Eldredge, Oct 01 '21 at 15:06
@NateEldredge I'm working on reverse engineering a program and rewriting in way to match exactly the original - inspired by efforts such as this one: https://github.com/pret/pokeemerald The compiler that developers used was part of a toolchain which I'm not sure on the legality of using it nowadays. By using it I get the expected output but it restrains me to this old and legally dubious toolchain. Luckily it's based on GCC 2.95.2 which I'm working on making it compile in modern hardware. Sadly I don't think we can de-optimize in any other way rather than using the old compiler version. — krystalgamer, Oct 01 '21 at 21:06
As I explained in the answer the optimization going on in newer versions of GCC does not expose a flag, thus it only leaves me with two options. * Modify GCC in order to expose that optimization * Just use the old version of GCC Path of least resistance is the second one, so I chose it. — krystalgamer, Oct 01 '21 at 21:07

score -1 · Accepted Answer · answered Oct 01 '21 at 10:08

The expected result changes as soon optimizations are enabled, -O1.

After testing with -O1 + -fno* flags to turn it into -O0 I've come to the conclusion there's an hidden optimization going on that can't be disabled at the moment.

Related: g++ O1 is not equal to O0 with all related optimization flags

Since the hidden optimization can't be disabled I'll just use the older version of the compiler that outputs as expected.

Why does gcc not optimize the immediate?

1 Answers1