Does the ARM ITE instruction have any use in this scenario

Question

My C compiler (GCC) is producing this code which I don't think is optimal

8000500: 2b1c cmp r3, #28

8000502: bfd4 ite le

8000504: f841 3c70 strle.w r3, [r1, #-112]

8000508: f841 0c70 strgt.w r0, [r1, #-112]

It seems to me that the compiler could happily omit ITE LE instruction as the two stores following it use the LE and GT flags from the CMP instruction so only one will actuall be performed. The ITE instruction means that only one of the STRs will be tested and performed so the time should be equal, but it is using an extra word of instruction memory.

Any opinions on this ?

Might be related information: https://stackoverflow.com/a/26001101 — Sami Kuhmonen, Jun 02 '19 at 14:35
Might be smart to mention which optimizer settings you used when compiling. You did enable optimizations right? — Lundin, Jun 03 '19 at 08:41

score 2 · Answer 1 · answered Jun 02 '19 at 18:03

In Thumb mode, the instruction opcodes (other than branch instructions) don't have any space for conditional execution. In Thumb1, this meant that one simply had to use branches to skip instructions if necessary.

In Thumb2 mode, the IT instruction was added, which adds the conditional execution capability, without embedding it into the instruction opcodes themselves. In your case, the le condition part of the strle.w instruction is not embedded in the opcode f841 3c70, but is actually inferred from the preceding ite le instruction by the disassembler. If you use a hex editor to change the ite le instruction to something else, the strle.w and strgt.w will both suddenly disassemble into plain str.w.

See the other linked answer, https://stackoverflow.com/a/26001101, for more details.

score 2 · Answer 2 · answered Jun 03 '19 at 08:38

The unified assembler syntax, which supports A32 and T32 targets, has added some confusion here. What is being shown in the disassembly is more verbose than what is encoded in the opcodes.

Your ITE instruction is very much a thumb instruction set placeholder, it defines an IT block which spans the following two instructions (and being thumb, those two instructions are not individually conditional). From a micro-architecture/timing point of view, it is only necessary to execute one instruction (but you shouldn't assume that this folding always takes place).

The strle/strgt syntax could be used on it's own for a T32 target, where the IT block is not necessary since the instruction set has a dedicated condition code field.

In order to write (or disassemble) code which can be used by both A32 and T32 assemblers, what you have here is both approaches to conditional execution written together. This has the advantage that the same assembly routine can be more portable (even if the resulting code is not identical - optimisations in the target cpu will also be different).

With T32, the combination of an it and a single 16 bit instruction matches the instruction density of the equivalent A32 instruction, if more than one conditional instruction can be combined, there is an overall win.

Does the ARM ITE instruction have any use in this scenario

2 Answers2