0

I have been implementing a Vector Math library in C++, using Assembly at some points to improve computational performance.

One of the greater gains is due to implementing the Assembly code for Square Root as described in http://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi

However, of course such implementation only works for x86, as it was already pointed in a past question about this very same code: How to get this sqrt inline assembly working for iOS

Now, my question is: there is any workaround, or is there any way to implement something similar to the following inline-assembly code but that also works for iOS, considering that fsqrt is x86-only?

double inline __declspec (naked) __fastcall sqrt14(double n)
{
    _asm fld qword ptr [esp+4]
    _asm fsqrt
    _asm ret 8
} 

Thanks in advance for your time.

EDIT: I am using Visual Studio 2013 to code and compile.

Community
  • 1
  • 1
MAnd
  • 157
  • 8
  • 1
    So you mean you want an implementation for ARM, rather than x86? – mindriot Jan 12 '16 at 10:53
  • If using this on x86 gives you significant gains, one has to wonder what the default implementation looks like - it's hard to imagine how it could be a lot slower than this. – 500 - Internal Server Error Jan 12 '16 at 13:05
  • This website seems to think there it a fsqrt for arm. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/FSQRT_float.html – LawfulEvil Jan 12 '16 at 13:05
  • @mindriot that was what I thought in order to make the implementation work in iOS. – MAnd Jan 12 '16 at 13:20
  • @500-InternalServerError Take a look at the comparison between 14 implementations of square root that I've linked to: http://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi I have run my own tests and found similar results: the In-line Assembly implementation from such link, whose code I mentioned in my question, is almost always 7x faster than the standard sqrt method in C++. – MAnd Jan 12 '16 at 13:22
  • My suggestion would have been the same as LawfulEvil's. Check, for example, https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf – mindriot Jan 12 '16 at 14:23
  • Just specify `-fno-math-errno` when compiling (or whatever the equivalent is on your compiler), and gcc already produces code at least as good as the inline asm you show for the standard sqrt function. In other words, don't go the asm route, that's a waste of your time. – Marc Glisse Jan 12 '16 at 14:43
  • @MarcGlisse Thanks for the suggestion. I'm using Visual Studio. Would its equivalent be `/fp:precise`? – MAnd Jan 12 '16 at 20:50
  • @MAnd I have no idea. You can compile `#include double f(double x){return sqrt(x);}` and see what code is generated. If there is a function call, try other flags. If there is an instruction with sqrt in its name and just 1 or 2 instructions on the side to load/store data, be happy. – Marc Glisse Jan 12 '16 at 21:04

1 Answers1

0

You would have to write your own assembly version of that code, using the appropriate ARM instruction set. Have a look at the ARM Reference Manual for more details.

Keep in mind, though, that converting your fast x86 solution is not guaranteed to be the fastest/most precise version on ARM, though. Essentially, if you really care about performance, you need to run your own comparisons similar to those on the CodeProject page, for your ARM platform.

mindriot
  • 5,413
  • 1
  • 25
  • 34
  • That is precisely what I am trying to do. I agree the ARM implementation won't mean necessarily as an improvement as it is under x86. Thus, my attempt to translate that code to work under iOS. – MAnd Jan 12 '16 at 21:11