m68k-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CFLAGS = -Wall -Werror -ffreestanding -nostdlib -O2 -m68000 -mshort
I am very confused why gcc generates such (seemingly) non-optimal code for a simple for loop over a const array.
const unsigned int pallet[16] = {
0x0000,
0x00E0,
0x000E,
...
0x0000
};
...
volatile unsigned long * const VDP_DATA = (unsigned long *) 0x00C00000;
...
for(int i = 0; i < 16; i++) {
*VDP_DATA = pallet[i];
}
Results in:
296: 41f9 0000 037e lea 37e <pallet+0x2>,%a0
29c: 223c 0000 039c movel #924,%d1
2a2: 4240 clrw %d0
2a4: 0280 0000 ffff andil #65535,%d0
2aa: 23c0 00c0 0000 movel %d0,c00000 <_etext+0xbffc2c>
2b0: b288 cmpl %a0,%d1
2b2: 6712 beqs 2c6 <main+0x46>
2b4: 3018 movew %a0@+,%d0
2b6: 0280 0000 ffff andil #65535,%d0
2bc: 23c0 00c0 0000 movel %d0,c00000 <_etext+0xbffc2c>
2c2: b288 cmpl %a0,%d1
2c4: 66ee bnes 2b4 <main+0x34>
My main concern:
Why the useless first element compare at 2b0
? This will never hit and never gets branched back to. It just ends up being duplicate code all for the first iteration.
- Is there a better way to write this dead-simple loop such that gcc wont produce this strange code?
- Are there any compiler flags/optimizations I can take advantage of?
O3
simply unrolls the loop, which I don't want either as space is a bigger concern than speed at this part of the code. - Maybe I'm being too scrupulous, but I just figured this wouldn't be the most difficult code to generate. I was expecting something more along the lines of (probably wrong but you get the idea):
lea pallet,%a0
movel #7,%d0
1:
movel %a0@+,c00000
dbra %d0,1
I get that I have to be a bit more explicit in my code to get it to write in long chunks. My main point here is how come gcc can't seem to figure out the my intentions i.e I just want to dump this data in to this address.
Another observation:
clrw %d0
→ andil #65535,%d0
→ movel %d0,c00000
. Why not just clrl
and move?