23

I'm looking to take an IEEE double and remove any integer part of it in the most efficient manner possible.

I want

1035 ->0
1045.23->0.23
253e-23=253e-23

I do not care about properly handling denormals, infinities, or NaNs. I do not mind bit twiddling, as I know I am working with IEEE doubles, so it should work across machines.

Branchless code would be much preferred.

My first thought is (in pseudo code)

char exp=d.exponent;
(set the last bit of the exponent to 1)
d<<=exp*(exp>0);
(& mask the last 52 bits of d)
(shift d left until the last bit of the exponent is zero, decrementing exp each time)
d.exponent=exp;

But the problem is that I can't think of an efficient way to shift d left until the last bit of the exponent is zero, plus it seems it would need to output zero if all of the last bits weren't set. This seems to be related to the base 2 logarithm problem.

Help with this algorithm or any better ones would be much appreciated.

I should probably note that the reason I want branchless code is because I want it to efficiently vectorize.

Jeremy Salwen
  • 8,061
  • 5
  • 50
  • 73

7 Answers7

42

How about something simple?

double fraction = whole - ((long)whole);

This just subtracts the integer portion of the double from the value itself, the remainder should be the fractional component. It's possible, of course, this could have some representation issues.

Mark Elliot
  • 75,278
  • 22
  • 140
  • 160
  • Not sure how the speed compares, but you can also do double fraction = whole%1; – jberg Apr 08 '11 at 01:16
  • @jberg: you can do `fmod` (`%` is integer only), and I'm almost sure it's slower since `%` will involve a division. – Mark Elliot Apr 08 '11 at 01:16
  • The problem is that it won't necessarily fit into a float. My upper bound. I'm looking to handle two different cases, one with numbers |x|<2^60, which this will work for, but others 2^60<|x|<2^128. A long can only hold up to 2^65. So perhaps I will use this one for the first case, but I still need something efficient for the second. – Jeremy Salwen Apr 08 '11 at 01:17
  • @Jeremy: in that case `fmod` is the way to go. – Mark Elliot Apr 08 '11 at 01:21
  • @Mark, I'm not sure that's a vectorizable call, but I'll look into it. – Jeremy Salwen Apr 08 '11 at 01:31
  • 3
    Actually, duh, this is the answer. If it's too big for a long, I can detect that and just set the fractional part to zero. Accepted. – Jeremy Salwen Apr 08 '11 at 01:59
  • 9
    Indeed, any value that big cannot have a fractional part. But you should use `int64_t`, not `long`. `long` might be only 32 bits, in which case the values you need won't fit. – R.. GitHub STOP HELPING ICE Apr 08 '11 at 03:30
  • 3
    @R..: `floor` would be a better choice than a cast to integer here, I think, at least for modern CPUs (x86 with SSE4 `roundpd`). Implementing this way requires the compiler to make code that will fail if the double is outside the range of representable long values. A round-trip from FP to integer and back can be slower than rounding. I tried both versions on [the Godbolt compiler explorer](https://godbolt.org/g/n56CkL). `floor` didn't inline with just `-fno-math-errno`, unfortunately, so I used `-ffast-math`. If you can't do that, and can't assume SSE4, casting looks good. – Peter Cordes Jul 23 '16 at 03:36
  • 1
    A conversion from `double` to `long` has UB for values outside the range of `long`, so the compiler should be able to optimize it assuming the value is in range (and thereby avoiding any conversion to integer type). – R.. GitHub STOP HELPING ICE Jul 23 '16 at 04:16
14
#include <math.h>
double fraction = fmod(d, 1.0);
user541686
  • 205,094
  • 128
  • 528
  • 886
  • 1
    I'm specifically asking because this is *not* efficient enough for my purposes. Obviously I could strip the NaN, infinity handling, and integer part calculation from some implementation of fmod, but it still has more branching in it than I like. – Jeremy Salwen Apr 08 '11 at 01:22
  • 1
    @Jeremy: Ah, I see... how about subtracting the `floor`? That shouldn't require any branching. – user541686 Apr 08 '11 at 01:24
  • That's a good point. Floor may be vectorizable. I will look into that. – Jeremy Salwen Apr 08 '11 at 01:29
  • 1
    @Jeremy: `_mm_floor_pd` might help. – user541686 Apr 08 '11 at 01:32
  • Hmm... It seems GCC won't vectorize it. – Jeremy Salwen Apr 08 '11 at 01:35
  • @Jeremy: It's an SSE4 instruction unfortunately, so your CPU might not support it... did you import `smmintrin.h` and compile with the SSE flag? – user541686 Apr 08 '11 at 01:36
  • I compiled with gcc -march=native -mtune=native -ftree-vectorizer-verbose=4 -O3 -std=c99 -lm main.c. I think none of my computers support SSE4, although one does support SSE4a. Is there a way to do something similar without SSE4? – Jeremy Salwen Apr 08 '11 at 01:49
  • @Jeremy: Compile with `-msse4`. But if your CPU doesn't support it then it won't work; in that case, you can't really use SIMD to vectorize this particular calculation. :\ – user541686 Apr 08 '11 at 02:05
12

The optimal implementation depends entirely on the target architecture.

On recent Intel processors, this can be achieved with two instructions: roundsd and subsd, but that can't be expressed in portable C code.

On some processors, the fastest way to do this is with integer operations on the floating point representation. Early Atom and many ARM CPUs come to mind.

On some other processors, the fastest thing is to cast to integer and back, then subtract, branching to protect large values.

If you're going to be handling lots of values, you can set the rounding mode to round-to-zero, then add and subtract +/-2^52 to the number truncated to integer, then subtract from the original value to get the fraction. If you don't have SSE4.1, but do have an otherwise modern Intel CPU and want to vectorize, this is typically the best you can do. It only makes sense if you have many values to process, however, because changing the rounding mode is somewhat expensive.

On other architectures, other implementations are optimal. In general, it doesn't make sense to talk about "efficiency" of C programs; only the efficiency of a specific implementation on a specific architecture.

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269
  • Perhaps I should have been more specific about efficiency. I mean efficient portable C99 code which GCC will vectorize. In the sense of mean squared running time on processors weighted by popularity of PC model ownership. – Jeremy Salwen Apr 08 '11 at 09:01
  • I'm found of add/subtract 2^52 (with proper copysign) and round to zero mode, original and neat. – aka.nice Aug 15 '12 at 20:57
  • 1
    Can't you express the desired semantics with `floor()` and the `-` operator? Then a compiler targeting SSE4 can use `roundsd` / `roundpd` to implement those semantics. (But I guess you might need `-fno-math-errno` or maybe the full `-ffast-math` to let it actually vectorize `floor()` to `roundpd`.) – Peter Cordes Jul 23 '16 at 03:22
10

Proposal

The function remainder computes the remainder, but not the integer part like modf does:

#include <math.h>

double fracpart(double input)
{
    return remainder(input, 1.);
}

This is the most efficient (and portable) way, as it doesn't compute unnecessary values to do the job (cf. modf, (long), fmod, etc.)


Benchmark

As Mattew suggested in the comments, I wrote some benchmark code to compare this solution to all the other ones offered on this page.

Please find below the time measurements for 65536 computations (compiled with Clang with optimizations turned off):

method 1 took 0.002389 seconds (using remainder)
method 2 took 0.000193 seconds (casting to long)
method 3 took 0.000209 seconds (using floor)
method 4 took 0.000257 seconds (using modf)
method 5 took 0.010178 seconds (using fmod)

Again with Clang, this time using the -O3 flag:

method 1 took 0.002222 seconds (using remainder)
method 2 took 0.000000 seconds (casting to long)
method 3 took 0.000000 seconds (using floor)
method 4 took 0.000223 seconds (using modf)
method 5 took 0.010131 seconds (using fmod)

Turns out the simplest solution seems to give the best results on most platforms, and the specific methods to perform that task (fmod, modf, remainder) are actually super-slow!

Mathieu Rodic
  • 6,637
  • 2
  • 43
  • 49
  • 1
    Won't this be a problem since remainder() rounds to nearest integral value? fmod() would be ok though. – shadow_map May 12 '15 at 12:29
  • I'd be interested to see how this efficiency compares to others. – Matthew D. Scholefield Jul 04 '16 at 03:45
  • @MatthewD.Scholefield excellent question, I'm benchmarking right now :) – Mathieu Rodic Jul 22 '16 at 09:13
  • 3
    `compiled with Clang with optimizations turned off`. Then your results are meaningless. Optimization doesn't speed everything up by the same percent. See my answer on [this question about an assignment where they had to optimize for `-O0`](http://stackoverflow.com/questions/32000917/c-loop-optimization-help-for-final-assignment/32001196#32001196). – Peter Cordes Jul 23 '16 at 03:25
  • The results are extremely similar with optimizations turned on (see edited answer). – Mathieu Rodic Aug 03 '16 at 12:09
  • 1
    `0.000000 seconds`. Sounds like your benchmark optimized away. I do expect that `floor` or casting to `long` are going to be very efficient, based on the fact that they only take a couple of fast asm instructions on x86 (vs. slow FP division), but this doesn't demonstrate anything. – Peter Cordes Aug 03 '16 at 15:11
  • 1
    If there was a case for merging answers this would be it. – Xofo Dec 20 '18 at 19:31
  • 1
    I updated your benchmark and added some checking of answers; the checks make it impossible for the compiler to completely optimize-away any of the methods, and also show that some of the solutions are buggy, like floor, remainder, and casting to long. See https://ideone.com/TAE8PD – jorgbrown Dec 24 '21 at 03:38
3

Standard library function modf solves this problem quite neatly.

#include <math.h>
/*...*/
double somenumber;
double integralPart;
double fractionalPart = modf(somenumber, &integralPart);

This should do what you have asked, is portable, and reasonably efficient.

An undocumented detail is whether the second argument could be NULL and then avoid the integral part temporary, which would be desirable in uses such as that you have described.

Unfortunately it seams many implementations do not support NULL for the second argument, so will have to use a temporary whether or not you use this value.

3

Some profiling and experimentation using C++ in Microsoft Visual Studio 2015 indicates that the best method for positive numbers is:

double n;
// ...
double fractional_part = n - floor(n);

It's faster than modf, and, as has already been mentioned, the remainder function rounds to the nearest integer, and therefore isn't of use.

Graham Asher
  • 1,648
  • 1
  • 24
  • 34
2

What you want is:

inline double min(double d1, double d2) { return d2 < d1 ? d2 : d1; }
inline double max(double d1, double d2) { return d2 > d1 ? d2 : d1; }

double fract_fast(double d) {
  // Clamp d to values that are representable 64-bit integers.
  // Any double-precision floating-point number outside of this range
  // has no fractional part, so these calls to min / max will have
  // no effect on the return value of this function.
  d = min((double)(((int64_t)1)<<62), d);
  d = max(-(double)(((int64_t)1)<<62), d);

  // C / C++ define casts to integer as always being the round-toward-zero
  // of the floating point number, regardless of rounding mode.
    int64_t integral_part = (int64_t)d;
    return d - (double)integral_part;
}

x86 has min / max instructions for floating-point, as well as single instructions for truncating to signed integers, and converting back from integers to floating-point. So the routine above is implementable with 5 instructions: min, max, convert-to-int, convert-from-int, and subtract. And indeed, at least one compiler (clang) delivers a 5-instruction implementation, as you can see at https://godbolt.org/z/7oqbPKMhc

( Or see the C++ version at https://godbolt.org/z/6sqGxzYsq )

jorgbrown
  • 1,993
  • 16
  • 23
  • And here's proof that it vectorizes well: – jorgbrown Dec 24 '21 at 00:02
  • The question is tagged C, not C++. – Andrew Henle Dec 24 '21 at 00:13
  • `std::min` is not `fmin` / `std::fmin` - they have different semantics for NaN and signed zero. See [What is the instruction that gives branchless FP min and max on x86?](https://stackoverflow.com/q/40196817) So you should probably tweak that comment to say min/max, not fmin/fmax. (And continue using `std::min/max` so it can inline efficiently to minps/maxps without extra work for NaN propagation.) – Peter Cordes Dec 24 '21 at 01:34
  • Thanks for catching that Andrew; I've switched the code to C now. – jorgbrown Dec 24 '21 at 03:07
  • Peter: Yeah, oddly, clang does the right thing with either max or fmax.... but gcc only does the right thing when max is used. So I switched the code and all but one comment to use max rather than fmax, and you caught the one place that I hadn't changed.... but now it's fixed. – jorgbrown Dec 24 '21 at 03:09
  • Also, I updated Matthieu's benchmark and ran it; see https://ideone.com/TAE8PD (Summary: There are only 3 correct answers here: fmod, modf, and this answer. And this answer is faster than any other answer other than cast-to-long, but cast-to-long gives the wrong answer when the input floating-point number is larger than "long" can represent. – jorgbrown Dec 24 '21 at 03:44