1

What is the most efficient way to implement

d=u / v
r=u mod v 

For the ARM V7M instruction set where u is unsigned 64 bit, and v is unsigned 32 bit?

I'm particularly interested in the special case that v is "normalized" so that its high bit is set.

I've seen various options in Knuth "The Art of Computer Programming (Vol 2)", but am having difficulty seeing the best way to implement this using the available V7M instructions UMULL etc.

cnicutar
  • 178,505
  • 25
  • 365
  • 392
PaulArmitt
  • 21
  • 2
  • I have no idea what you mean by "normalized" or what relevance the high-bit being has. Can you elaborate? – Clifford Jan 05 '15 at 21:58
  • To normalize the fastest will probably be look up tables, not sure that there is a find first bit instruction. I agree with Clifford, let the compiler deal with the divide. – old_timer Jan 05 '15 at 22:08
  • If you want to know if the multiply accumulate instructions are used, compile your [tag:C] code to the arm assembly, and look at it. The optimal algorithm shouldn't change just because you have these extra instructions, but you might be able to eke out some extra performance by adding in these instructions. – Degustaf Jan 05 '15 at 22:34
  • BTW, are you really interested in integer division, or fixed point division? – Degustaf Jan 05 '15 at 22:35

2 Answers2

2

(This is similar to other answer, just from another angle)

ARM 32 bit toolchains require a function implementation called __aeabi_uldivmod to off-load this operation which probably you can find various implementations and one particular is from clang udivmoddi4.c which points to Figure 3-40 of The PowerPC Compiler Writer's Guide (Section 3.2.3.7)

auselen
  • 27,577
  • 7
  • 73
  • 114
  • this is correct - ARM does not do division on the chip - it does multiply and didn't even use to do that since it prefers to do single cycle operations only (it is RISC after all). Division is done off chip in a routine coded with multiple ARM instructions. – user30803 Jan 07 '15 at 09:09
  • @user1444886 a few cores, v7r and 64 bit variants have division capabilities but I guess that's not relevant to this question. – auselen Jan 07 '15 at 09:22
  • @auselen Not a "few cores" - even the low-end Cortex-M3 has integer division (but not 64 bit). – Clifford Jan 07 '15 at 10:19
  • @Clifford :), I was thinking like number of core types with div cap. / all core types. Cortex-M3 being only a single core type. – auselen Jan 07 '15 at 10:50
  • @auselen : In new designs I would expect the later cores with div to be used; but I take your point - div is not a given for all ARM. – Clifford Jan 07 '15 at 16:38
0

Since your compiler almost certainly supports 64 bit data types, what is wrong with letting the compiler generate suitable code? The compiler contains a great deal of target specific knowledge and will probably produce an optimal result.

Given:

uint64_t u = x ;
uint32_t v = y ;

Then:

uint64_t d = u / v ;
uint32_t r = u % v ; 
Clifford
  • 88,407
  • 13
  • 85
  • 165
  • He's specifying assembly tag – phuclv Jan 06 '15 at 04:03
  • @LưuVĩnhPhúc : It is also tagged C, so the answer is legitimate. One could always inspect the assembly the compiler generates, but I see little purpose. – Clifford Jan 06 '15 at 10:07
  • if you inspected the assembly you would see a call to an off-chip division routine that must be provided in the implementation – user30803 Jan 07 '15 at 09:11
  • @user1444886 : Did you mean "off-chip", or merely a library or compiler "built-in" call? Off-chip would imply some sort of integer math co-processor. Library calls can be disassembled or stepped-into in your debugger, and are available as source in open source compilers. – Clifford Jan 07 '15 at 10:12
  • @user1444886 : It looks like by "off-chip" you simply mean "not in hardware" - i.e. a software implementation? I would not refer to that as "off-chip", but I see what you mean. – Clifford Jan 07 '15 at 10:21