Numpy: Check if float array contains whole numbers

Question

In Python, it is possible to check if a float contains an integer value using n.is_integer(), based on this QA: How to check if a float value is a whole number.

Does numpy have a similar operation that can be applied to arrays? Something that would allow the following:

>>> x = np.array([1.0 2.1 3.0 3.9])
>>> mask = np.is_integer(x)
>>> mask
array([True, False, True, False], dtype=bool)

It is possible to do something like

>>> mask = (x == np.floor(x))

or

>>> mask = (x == np.round(x))

but they involve calling extra methods and creating a bunch of temp arrays that could be potentially avoided.

Does numpy have a vectorized function that checks for fractional parts of floats in a way similar to Python's float.is_integer?

score 11 · Accepted Answer · answered Jan 27 '16 at 16:28

11

From what I can tell, there is no such function that returns a boolean array indicating whether floats have a fractional part or not. The closest I can find is np.modf which returns the fractional and integer parts, but that creates two float arrays (at least temporarily), so it might not be best memory-wise.

If you're happy working in place, you can try something like:

>>> np.mod(x, 1, out=x)
>>> mask = (x == 0)

This should save memory versus using round or floor (where you have to keep x around), but of course you lose the original x.

The other option is to ask for it to be implemented in Numpy, or implement it yourself.

answered Jan 27 '16 at 16:28

hunse

3,175
20
25

1

Funny you should mention that. I was asking because I am thinking about doing just that. – Mad Physicist Jan 27 '16 at 16:43
EDIT: No it wouldn't -> Wouldn't bitwise_and be faster here? EDIT: oh, right, they inherently only apply to integers, not floats! Spent a bit too much time doing JS integer coercian lately... – Job Mar 29 '17 at 14:24

Job · Answer 2 · 2017-04-14T14:33:46.553

I needed an answer to this question for a slightly different reason: checking when I can convert an entire array of floating point numbers to integers without losing data.

Hunse's answer almost works for me, except that I obviously can't use the in-place trick, since I need to be able to undo the operation:

if np.all(np.mod(x, 1) == 0):
    x = x.astype(int)

From there, I thought of the following option which probably is faster in many situations:

x_int = x.astype(int)
if np.all((x - x_int) == 0):
    x = x_int

The reason is that the modulo operation is slower than subtraction. However, now we do the casting to integers up-front - I don't know how fast that operation is, relatively speaking. But if most of your arrays are integers (they are in my case), the latter version is almost certainly faster.

Another benefit is that you could replace the subraction with something like np.isclose to check within a certain tolerance (of course you should be careful here, since truncation is not proper rounding!).

x_int = x.astype(int)
if np.all(np.isclose(x, x_int, 0.0001)):
    x = x_int

EDIT: Slower, but perhaps worth it depending on your use-case, is also converting integers individually if present.

x_int = x.astype(int)
safe_conversion = (x - x_int) == 0
# if we can convert the whole array to integers, do that
if np.all(safe_conversion):
    x = x_int.tolist()
else:
    x  = x.tolist()
    # if there are _some_ integers, convert them
    if np.any(safe_conversion):
        for i in range(len(x)):
            if safe_conversion[i]:
                x[i] = int(x[i])

As an example of where this matters: this works out for me, because I have sparse data (which means mostly zeros) which I then convert to JSON, once, and reuse later on a server. For floats, ujson converts those as [ ...,0.0,0.0,0.0,... ], and for ints that results in [...,0,0,0,...], saving up to half the numbers of characters in the string. This reduces overhead on both the server (shorter strings) and the client (shorter strings, presumably slightly faster JSON parsing).

Just realized that this is my own question. Thanks for a nice answer. — Mad Physicist, Mar 30 '17 at 12:10
You're welcome! :) I just realised that if _some_ of the elements can be safely converted, the added overhead of doing so is worth it for me, so I added the code for how to do that too. — Job, Apr 14 '17 at 14:34
Just gave this a try and did some benchmarks. For me, subtraction was always faster than modulo (2-3x), regardless of the amount of integer valued floats. But I think there's an even more optimal answer: `def is_int_valued(x): return np.all(x == np.floor(x))` — ivirshup, Jun 30 '20 at 03:10
Ah, that avoids a float-to-integer conversion, and a conversion back for the comparison, so that makes sense! However, if you *expect* the conversion goes through then it might not be optimal, because then you also need to do the integer conversion again. Requires benchmarking per use-case I guess :) — Job, Jul 01 '20 at 11:05

Mad Physicist · Answer 3 · 2023-07-24T22:28:31.250

While the accepted method of (x % 1) == 0 is quite adequate, it bothers me that there is no way to accomplish this natively in numpy, especially given the existence of float.is_integer in vanilla python.

I therefore did a bit of research on the floating point formats supported by numpy (float16, float32, float64, float128 (acutally extended precision)), and on how to write a ufunc.

The result is that for IEEE754 floats small enough to fit into a corresponding unsigned integer type (pretty much everything up to float64 on a normal machine), you can do the checks with some simple bit twiddling. For example, here is a C99 function that very quickly tells you if your float32 contains an integer value:

#include <stdint.h>

int is_integer(float n)
{
    uint32_t k = ((union { float n; uint32_t k; }){n}).k;

    // Zero when everything except sign bit is zero
    if((k & 0x7FFFFFFF) == 0) return 1;

    uint32_t exponent = k & 0x7F800000;

    // NaN or Inf when the exponent bits are all ones
    // Guaranteed fraction when exponent < 0
    if(exponent == 0x7F800000 || exponent < 0x3F800000) return 0;
    // Guaranteed integer when exponent >= FLT_MANT_DIG - 1
    if(exponent >= 0x4B000000) return 1;
    // Otherwise, check that the significand bits past the exponent are zeros
    return (k & (0x7FFFFF >> ((exponent >> 23) - 0x7F))) == 0;
}

I went ahead and wrapped this function and its siblings in a ufunc, which can be found here: https://gitlab.com/madphysicist/is_integer_ufunc. One nice feature is that this ufunc returns True for all integer types instead of raising an error. Another is that it runs anywhere from 5x to 40x faster than (x % 1) == 0, depending on dtype and input size.

Based on the linked tutorial, you can install with python setup.py {build_ext --inplace, build, install}, depending on how bad you want it. Perhaps I should see if the numpy community is interested in including this ufunc.

Steven C. Howell · Answer 4 · 2020-07-24T02:56:09.697

2

You can also just use the Python method in a list comprehension.

>>> x = np.array([1.0, 2.1, 3.0, 3.9])
>>> mask = np.array([val.is_integer() for val in x])
>>> mask
array([ True, False,  True, False])

Compared to the answer using mod 1, this was slightly faster for the given example with 4 values (5.66 us vs 8.03 us) and over 3x faster for an array of 1000 values.

edited Jul 24 '20 at 02:56

answered Jul 24 '20 at 02:48

Steven C. Howell

16,902
15
72
97

score 0 · Answer 5 · answered Jul 18 '21 at 00:46

0

Inspired by the accepted answer, here's a non-inplace version using the % operator:

modulus = x % 1
mask = modulus == 0

or more succinctly

mask = (x % 1) == 0

answered Jul 18 '21 at 00:46

Jasha

5,507
2
33
44

Numpy: Check if float array contains whole numbers

5 Answers5

Linked

Related