11

How can I instruct gcc to emit idiv (integer division, udiv and sdiv) instructions for arm application processors?

So far only way I can come up with is to use -mcpu=cortex-a15 with gcc 4.7.

$cat idiv.c
int test_idiv(int a, int b) {
    return a / b;
}

On gcc 4.7 (bundled with Android NDK r8e)

$gcc -O2 -mcpu=cortex-a15 -c idiv.c
$objdump -S idiv.o

00000000 <test_idiv>:
   0:   e710f110    sdiv    r0, r0, r1
   4:   e12fff1e    bx  lr

Even this one gives idiv.c:1:0: warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch [enabled by default] if you add -march=armv7-a next to -mcpu=cortex-a15 and doesn't emit idiv instruction.

$gcc -O2 -mcpu=cortex-a15 -march=armv7-a -c idiv.c

idiv.c:1:0: warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch [enabled by default]

$objdump -S idiv.o
00000000 <test_idiv>:
   0:   e92d4008    push    {r3, lr}
   4:   ebfffffe    bl  0 <__aeabi_idiv>
   8:   e8bd8008    pop {r3, pc}

On gcc 4.6 (bundled with Android NDK r8e) it doesn't emit idiv instructions at all but recognizes -mcpu=cortex-a15 also doesn't complain to -mcpu=cortex-a15 -march=armv7-a combination.

Afaik idiv is optional on armv7, so there should be a cleaner way to instruct gcc to emit them but how?

auselen
  • 27,577
  • 7
  • 73
  • 114
  • Are you sure there is an actual instruction called `idiv`? I can only find `sdiv` (signed division) and `udiv` (unsigned division) ... – unwind Apr 03 '13 at 08:16
  • @unwind I assume idiv = sdiv | udiv – auselen Apr 03 '13 at 08:18
  • Which CPU specifically is your target? AFAIK the Cortex-A15 supports sdiv/udiv but the Cortex-A5 does not. Both are ARMv7-A compatible. – Austin Phillips Apr 03 '13 at 11:17
  • @AustinPhillips Nothing special, I'm just trying to understand if there is a way to tell gcc to spill idiv instructions instead of linking to __aeabi* stuff. One example might be to build Android stack for an emulator which supports idiv. – auselen Apr 03 '13 at 17:13
  • 1
    The warning **conflicts with** looks like a bug. Just don't specify an `-march`. See [arm.c](http://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/arm/arm.c?revision=197425&view=markup), lines 1644- Or maybe this logic is just to address the situation. Since you requested a *CPU* that is *better* than the *ARCH*, the *ARCH* is not valid. I think you just want a *naked* `-mcpu=cortex-a7` (or `-mcpu=cortex-a15`, `-mcpu=cortex-r5`) to get *idiv*; these are the only CPU's that support it and they are better than the *ARCH*. – artless noise Apr 03 '13 at 18:49

1 Answers1

7

If the instruction is not in the machine descriptions, then I doubt that gcc will emit code. Note1

You can always use inline-assembler to get the instruction if the compiler is not supporting it.Note2 Since your op-code is fairly rare/machine specific, there is probably not so much effort to get it in the gcc source. Especially, there are arch and tune/cpu flags. The tune/cpu is for a more specific machine, but the arch is suppose to allow all machines in that architecture. This op-code seems to break that rule, if I understand.

For gcc 4.6.2, it looks like thumb2 and cortex-r4 are cues to use these instructions and as you have noted with gcc 4.7.2, the cortex-a15 seems to be added to use these instructions. With gcc 4.7.2, the thumb2.md file no longer has udiv/sdiv. However, it might be included somewhere else; I am not 100% familiar with all the machine description language. It also seems that cortex-a7, cortex-a15, and cortex-r5 may enable these instructions with 4.7.2. Note3

This doesn't answer the question directly, but it does give some information/path to get the answer. You can compile the module with -mcpu=cortex-r4, although this may produce linker issues. Also, there is int my_idiv(int a, int b) __attribute__ ((__target__ ("arch=cortexe-r4")));, where you can specify on a per-function basis the machine-description used by the code generator. I haven't used any of these myself, but they are only possibilities to try. Generally you don't want to keep the wrong machine as it could generate sub-optimal (and possibly illegal) op-codes. You will have to experiment and maybe then provide the real answer.

Note1: This is for a stock gcc 4.6.2 and 4.7.2. I don't know if your Android compiler has patches.

gcc-4.6.2/gcc/config/arm$ grep [ius]div *.md
arm.md: "...,sdiv,udiv,other"
cortex-r4.md:;; We guess that division of A/B using sdiv or udiv, on average, 
cortex-r4.md:;; This gives a latency of nine for udiv and ten for sdiv.
cortex-r4.md:(define_insn_reservation "cortex_r4_udiv" 9
cortex-r4.md:       (eq_attr "insn" "udiv"))
cortex-r4.md:(define_insn_reservation "cortex_r4_sdiv" 10
cortex-r4.md:       (eq_attr "insn" "sdiv"))
thumb2.md:  "sdiv%?\t%0, %1, %2"
thumb2.md:   (set_attr "insn" "sdiv")]
thumb2.md:(define_insn "udivsi3"
thumb2.md:      (udiv:SI (match_operand:SI 1 "s_register_operand"  "r")
thumb2.md:  "udiv%?\t%0, %1, %2"
thumb2.md:   (set_attr "insn" "udiv")]
gcc-4.7.2/gcc/config/arm$ grep -i [ius]div *.md
arm.md:  "...,sdiv,udiv,other"
arm.md:  "TARGET_IDIV"
arm.md:  "sdiv%?\t%0, %1, %2"
arm.md:   (set_attr "insn" "sdiv")]
arm.md:(define_insn "udivsi3"
arm.md: (udiv:SI (match_operand:SI 1 "s_register_operand"  "r")
arm.md:  "TARGET_IDIV"
arm.md:  "udiv%?\t%0, %1, %2"
arm.md:   (set_attr "insn" "udiv")]
cortex-a15.md:(define_insn_reservation "cortex_a15_udiv" 9
cortex-a15.md:       (eq_attr "insn" "udiv"))
cortex-a15.md:(define_insn_reservation "cortex_a15_sdiv" 10
cortex-a15.md:       (eq_attr "insn" "sdiv"))
cortex-r4.md:;; We guess that division of A/B using sdiv or udiv, on average, 
cortex-r4.md:;; This gives a latency of nine for udiv and ten for sdiv.
cortex-r4.md:(define_insn_reservation "cortex_r4_udiv" 9
cortex-r4.md:       (eq_attr "insn" "udiv"))
cortex-r4.md:(define_insn_reservation "cortex_r4_sdiv" 10
cortex-r4.md:       (eq_attr "insn" "sdiv"))

Note2: See pre-processor as Assembler if gcc is passing options to gas that prevent use of the udiv/sdiv instructions. For example, you can use asm(" .long <opcode>\n"); where opcode is some token pasted stringified register encode macro output. Also, you can annotate your assembler to specify changes in the machine. So you can temporarily lie and say you have a cortex-r4, etc.

Note3:

gcc-4.7.2/gcc/config/arm$ grep -E 'TARGET_IDIV|arm_arch_arm_hwdiv|FL_ARM_DIV' *
arm.c:#define FL_ARM_DIV    (1 << 23)         /* Hardware divide (ARM mode).  */
arm.c:int arm_arch_arm_hwdiv;
arm.c:  arm_arch_arm_hwdiv = (insn_flags & FL_ARM_DIV) != 0;
arm-cores.def:ARM_CORE("cortex-a7",  cortexa7,  7A, ... FL_ARM_DIV
arm-cores.def:ARM_CORE("cortex-a15", cortexa15, 7A, ... FL_ARM_DIV
arm-cores.def:ARM_CORE("cortex-r5",  cortexr5,  7R, ... FL_ARM_DIV
arm.h:  if (TARGET_IDIV)                                \
arm.h:#define TARGET_IDIV               ((TARGET_ARM && arm_arch_arm_hwdiv) \
arm.h:extern int arm_arch_arm_hwdiv;
arm.md:  "TARGET_IDIV"
arm.md:  "TARGET_IDIV"
Community
  • 1
  • 1
artless noise
  • 21,212
  • 6
  • 68
  • 105
  • (+1) Thanks for giving some courage for machine definitions, I think I can define a new machine type like *armv7-ad* copying from armv7-a. I don't think you can use inline assembler, compiler will still cry over mismatching instruction. About mach, mcpu, mtune... So integer div is optional, it is optional to mach... then I would expect either a mach variation or some extra flag like fpu. I expect mcpu/mtune to give optimizer some instruction scheduling information - that's why I hesitate to use cortex-a15. – auselen Apr 03 '13 at 17:44
  • @auselen Did you try `-mcpu=cortex-a7`? Or that is not for your machine? It looks like it gives the same *FL_ARM_DIV* permissions as the `-mcpu=cortex-a15`. – artless noise Apr 03 '13 at 18:05
  • 1
    Nope. Let's assume I use a cortex-a9 with idiv support. May be I can use "-mcpu=cortex-a15 -mtune=cortex-a9", I'll test that later. – auselen Apr 03 '13 at 18:41
  • Ok, the cortex-a9 doesn't support it in gcc-4.7. You can change the [*arm-cores.def*](http://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/arm/arm-cores.def?revision=197346&view=markup) to make a new *core* like *cortex-a9-div* and then use the `FL_ARM_DIV` flag to say that is supports *idiv* (copying the existing *cortex-a9*). Then you could use `-mcpu=cortex-a9-div`. Probably no `gcc` support this yet. It looks like `-mcpu=cortex-a7` is an alternative pick. Cortex a7,a9,a15 all have different schedulers. – artless noise Apr 03 '13 at 19:04
  • @auselen Hmm, but only the `cortex-a15` has the schedule stuff for *idiv*. Even the `cortex-a7` is missing it. Maybe better to post to a `gcc` mailing list, to get some part of the scheduler working. – artless noise Apr 03 '13 at 19:14
  • So it is somehow same question, imagine you need to build an app that can work on cortex-a15 or cortex-a7. There should be a generic way to say I want idiv instructions on an armv7. I'm just saying this to state intent of my question. – auselen Apr 04 '13 at 10:01
  • @auselen There are three things the compiler has for the code gen. Is the `op-code` *allowed*, the cost (number of cycles), and the scheduling. For non-pipelined arch/CPU, there is no scheduler. The scheduler just models load/store, multiple execute, MAC unit, etc. and tries to sequence instruction so each stage is *humming*. Selecting *cortex-a15* and *cortex-a7* will both give permission to use `idiv`. I expect the costing (cycle/idiv) is the same. I think your idea of `-mtune=cortex-a9` gives you the right scheduler with permission to use `idiv`; but `idiv` has no scheduling model. – artless noise Apr 04 '13 at 14:25
  • @auselen ..has no scheduling model. So the compiler may mis-place `idiv` if you have *multiplies* in the same code for instance. However, it is still more efficient than calling `__aeabi_idiv` which is 100-1000s of instructions. – artless noise Apr 04 '13 at 14:31
  • @auselen To answer your last question, the `cortex-a7` and `cortex-a15` are [identical in arm-cores.def](http://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/arm/arm-cores.def?revision=197346&view=markup) except the scheduler. Scroll to the bottom; the code should be compatible, just depending on the one you pick, the code will run faster on one or the other. If you then go and use `-mtune=cortext-a9`, I don't think it matters which one you pick. – artless noise Apr 05 '13 at 01:49
  • mtune doesn't effect idiv emitting. idiv is controlled by mcpu so far. – auselen Apr 12 '13 at 12:29
  • @auselen Yes, that is what I expect. `mtune` affect instruction scheduling. The only issue is the scheduler doesn't know anything about `idiv`. Otherwise, I believe you are optimal. – artless noise Apr 12 '13 at 14:05