1

I was trying divide a result by 9 in ARM quite similarly to ARM DIVISION HOW TO DO IT? except for a couple of things,

  1. I'm trying to divide a 16 bit number (halfword)
  2. It is signed

I have the following implementation at the moment to divide [r8] and place it into [r1] but the result differs from the C++ implementation when the 16th bit is set and works otherwise

LDR         r7, =0x1C72 ; 2**16 *(1/9) +1   
MUL         r9, r8, r7
LSR         r9, #16
STRH        r9, [r1], #2

Please let me know if you understand why. (ps I also tried with SMULBB but it wasn't any better

Community
  • 1
  • 1
VictorC
  • 71
  • 7
  • [did you try this?](https://gcc.godbolt.org/#compilers:!\(\(compiler:armhfg482,options:'-O2+-Wall+',source:'short+div9\(short+x\)%0A%7B%0A++return+x/9%3B%0A%7D'\)\),filterAsm:\(commentOnly:!t,directives:!t,intel:!t,labels:!t\),version:3) – phuclv Apr 26 '16 at 15:56
  • If it's "a result" and not "a pair of results", what's with the `sadd16`? Admittedly I'm just skimming and not going through it in detail, but mixing packed-halfword SIMD operations with 32-bit operations seems like a very good way to accidentally lose sign bits and make things go awry. – Notlikethat Apr 26 '16 at 16:16
  • @Notlikethat I left the [sadd16] to make it clear that [r8] was a halfword, its not critical to the code. I'll change it – VictorC Apr 27 '16 at 07:17
  • Unless you're doing loads, stores, or SIMD then there are no halfwords, there are only 32-bit register values. Yes, it's not critical to the division code itself, but my hunch is that it probably _is_ critical to the misbehaviour you see, as it implies that the value passed into the `mul` isn't properly sign-extended. Since the question is missing any detail of the actual problem (i.e. what the output values are and how they differ from expectations) we can only guess... – Notlikethat Apr 27 '16 at 08:28
  • I am indeed using LDRSH because I am trying to translate a C++ code which currently uses int16_t as an input and output. However, since there are additions in the code, the value to divide can go outside the halfword range which C++ handles fine but not my ARM code, so I may just have a sort of translation from signed halfword to signed full word on load and stor – VictorC Apr 27 '16 at 09:20
  • @Notlikethat Okay, I just realized I'm really stupid and LDRSH bit extends to sign bit making working over the full word fine. Sorry for wasting your time – VictorC Apr 27 '16 at 09:35

1 Answers1

1

Not sure if anyone cares but I have found a sort of solution. After looking at the results, I noticed ARM division with my technique yielded a number one less than C++.

Hence the modification which makes it work:

TST         r8, #32768
SMULBB      r8, r8, r7
ASR         r8, #16
ADDNE       r8, #1

The other problem I have now,is that the division occurs after nine additions. When the result of those additions is outside the halfword range, C++ manages to still output the good result where as the ARM result seems to get saturated in a way.

I'm going to have to modify the code to translate the halfwords to fullwords and hence will have to change the multiplication to 32 bit.

The code above should work as long as your starting value is in the signed halfword range

VictorC
  • 71
  • 7