1

I am trying to follow another SO post and implement sqrt14 within my iOS app:

double inline __declspec (naked) __fastcall sqrt14(double n)
{
    _asm fld qword ptr [esp+4]
    _asm fsqrt
    _asm ret 8
}

I have modified this to the following in my code:

double inline __declspec (naked) sqrt14(double n)
{
    __asm__("fld qword ptr [esp+4]");
    __asm__("fsqrt");
    __asm__("ret 8");
}

Above, I have removed the "__fastcall" keyword from the method definition since my understanding is that it is for x86 only. The above gives the following errors for each assembly line respectively:

Unexpected token in argument list

Invalid instruction

Invalid instruction

I have attempted to read through a few inline ASM guides and other posts on how to do this, but I am generally just unfamiliar with the language. I know MIPS quite well, but these commands/registers seem to be very different. For example, I don't understand why the original author never uses the passed in "n" value anywhere in the assembly code.

Any help getting this to work would be greatly appreciated! I am trying to do this because I am building an app where I need to calculate sqrt (ok, yes, I could do a lookup table, but for right now I care a lot about precision) on every pixel of a live-video feed. I am currently using the standard sqrt, and in addition to the rest of the computation, I'm running at around 8fps. Hoping to bump that up a frame or two with this change.

If it matters: I'm building the app to ideally be compatibly with any current iOS device that can run iOS 7.1 Again, many thanks for any help.

Community
  • 1
  • 1
Matthew Herbst
  • 29,477
  • 23
  • 85
  • 128

1 Answers1

2

The compiler is perfectly capable of generating fsqrt instruction, you don't need inline asm for that. You might get some extra speed if you use -ffast-math.

For completeness' sake, here is the inline asm version:

__asm__ __volatile__ ("fsqrt" : "=t" (n) : "0" (n));

The fsqrt instruction has no explicit operands, it uses the top of the stack implicitly. The =t constraint tells the compiler to expect the output on the top of the fpu stack and the 0 constraint instructs the compiler to place the input in the same place as output #0 (ie. the top of the fpu stack again).

Note that fsqrt is of course x86-only, meaning it wont work for example on ARM cpus.

Jester
  • 56,577
  • 4
  • 81
  • 125
  • Oh, so if it's x86-only then it won't work at all on iOS, correct? Also, when you say -ffast-math, do you mean as a compiler flag? – Matthew Herbst Apr 25 '14 at 19:44
  • Yes it only works on x86. And yes, `-ffast-math` is a compiler flag, it turns off error checking, among other things. The flag does work on other architectures too. – Jester Apr 25 '14 at 19:54
  • Great, thanks! That flag did help speed up the program! When I put the code above into Xcode it gives the error: "Invalid output constraint '=t' in asm" – Matthew Herbst Apr 26 '14 at 16:02