Converting an inline-asm x87 fsqrt function from C++ to C for x86-64

Question

I was looking at different methods to compute the square root, and one in particular (sqrt14 from here) caught my attention, unfortunately it was written in C++ (it only uses assembly), it's difficult for me to translate it back to C - if that's possible.

double inline __declspec (naked) __fastcall sqrt14(double n)
{
    _asm fld qword ptr [esp+4]
    _asm fsqrt
    _asm ret 8
}

As can be seen here, inserting assembly in C++ is different from C.

I wanted to ask you if it was possible to have a C equivalent, and if so, can I ask you to write it? If it's useful, my architecture is 64bits.

I suspect that the function declaration will be like this one:

double inline __attribute__((fastcall, naked)) sqrt14(double n);

... but I don't know enough about assembly to do the rest...

Most of code in the question is a compiler extension and unrelated to both C or C++. FWIW there is nothing C++ specific in the original. — Eugene Sh., Jan 25 '21 at 16:44
The answer you linked to gives different answers for gcc and Visual C++. Visual C++ is, despite its name, also a C compiler, and gcc is, despite its name, also a C++ compiler. — molbdnilo, Jan 25 '21 at 16:48
You should check the assembly language output from your compiler (in release mode with optimizations set high). The assembly language you provided only calls the processor's floating point square root function. This may happen automatically by the compiler when optimization settings are high. — Thomas Matthews, Jan 25 '21 at 17:33
@fuz In fact, I have to make a simulation, and this one takes much more time with the "normal" functions sqrt and sqrtf. For example, it goes 3 times faster using fast inverse square root (from Quake III). So I was looking for other "miracle" functions that were either faster or more accurate - without wasting too much time. — TheBigBadBoy, Jan 26 '21 at 09:22
Also, one thing I do not understand (please explain me) : why the question is downvoted ? I thought it was on-topic, because it is a "specific programming problem" as said here : https://stackoverflow.com/help/on-topic Perhaps my question was not clear enough ? — TheBigBadBoy, Jan 26 '21 at 09:33
@TheBigBadBoy Which architecture and toolchain are you programming for? — fuz, Jan 26 '21 at 11:10
@fuz I am using a 64bits pc with Intel i5-3210M, on Ubuntu using gcc 9.3.0. For now I'm only programming on my pc, but things may change in the future... — TheBigBadBoy, Jan 27 '21 at 13:39
@TheBigBadBoy Consider using the option `-fno-math-errno` and the standard library functions. This will cause gcc to generate the appropriate SSE instructions for square roots instead of library calls. Also make sure to compile with optimisations enabled. If you make a new question with more details about the specific code you try to optimise, I might be able to provide more specific answers. — fuz, Jan 27 '21 at 13:45
@TheBigBadBoy As for the code in question, you can replace it with `static inline double sqrt14(double x) { asm("fsqrt" : "+t"(x)); return (x); }`. Note however that it's likely to be slower than calling `sqrt()` from `math.h`. — fuz, Jan 27 '21 at 13:47

score 3 · Accepted Answer · edited Jan 25 '21 at 19:09

3

The example you cite is very specific to one compiler...

__declspec (naked) is an implementation specific feature (non-Standard)
__fastcall is an implementation specific feature (non-Standard)

Even in your revised example:

__attribute__((fastcall, naked)) is an implementation specific feature (non-Standard)

Even the inclusion of assembler is an implementation specific feature (non-Standard) - ie each compiler may do it a slightly different way.

So the long and short of it, the example code is fine for the compiler and target processor, but is completely non-portable to another toolchain or processor.

edited Jan 25 '21 at 19:09

njuffa

23,970
4
78
130

answered Jan 25 '21 at 16:51

Andrew

2,046
1
24
37

Ah okay. Indeed I prefer the portability of the program rather than its speed. Thanks for the explanation. – TheBigBadBoy Jan 25 '21 at 16:58
2

@TheBigBadBoy: modern compilers will inline `sqrt()` as an SSE2 `sqrtsd` instruction, without any stupid function-call overhead or storing the arg to memory and reloading with legacy x87. You're not giving up speed by getting rid of this old inline asm, you're probably gaining speed. Especially if you use `gcc -O3 -ffast-math` (if you actually care about 32-bit code like this was, then also benchmark with `-mfpmath=sse`) – Peter Cordes Jan 25 '21 at 17:32

Converting an inline-asm x87 fsqrt function from C++ to C for x86-64

1 Answers1