11

I have an array as following:

In [1]: x = array(['1.2', '2.3', '1.2.3'])

I want to test if each element in the array can be converted into numerical value. That is, a function: is_numeric(x) will return a True/False array as following:

In [2]: is_numeric(x)
Out[2]: array([True, True, False])

How to do this?

Wei Li
  • 597
  • 3
  • 5
  • 13
  • Possible duplicate of [How do I check if a string is a number (float) in Python?](http://stackoverflow.com/questions/354038/how-do-i-check-if-a-string-is-a-number-float-in-python) – far Jun 23 '16 at 16:01
  • 4
    @farhan3: Not a duplicate. The appropriate methods for working with a NumPy array are almost always quite different from the appropriate methods for working with individual ordinary Python objects. – user2357112 Jun 23 '16 at 16:07
  • 1
    Actually that other question is quite useful. No one has come up with a way of bypassing the iterative application of a single string test. – hpaulj Jun 23 '16 at 20:39

5 Answers5

6
import numpy as np

def is_float(val):
        try:
            float(val)
        except ValueError:
            return False
        else:
            return True

a = np.array(['1.2', '2.3', '1.2.3'])

is_numeric_1 = lambda x: map(is_float, x)              # return python list
is_numeric_2 = lambda x: np.array(map(is_float, x))    # return numpy array
is_numeric_3 = np.vectorize(is_float, otypes = [bool]) # return numpy array

Depend on the size of a array and the type of the returned values, these functions have different speed.

In [26]: %timeit is_numeric_1(a)
100000 loops, best of 3: 2.34 µs per loop

In [27]: %timeit is_numeric_2(a)
100000 loops, best of 3: 3.13 µs per loop

In [28]: %timeit is_numeric_3(a)
100000 loops, best of 3: 6.7 µs per loop

In [29]: a = np.array(['1.2', '2.3', '1.2.3']*1000)

In [30]: %timeit is_numeric_1(a)
1000 loops, best of 3: 1.53 ms per loop

In [31]: %timeit is_numeric_2(a)
1000 loops, best of 3: 1.6 ms per loop

In [32]: %timeit is_numeric_3(a)
1000 loops, best of 3: 1.58 ms per loop

If list is okay, use is_numeric_1.

If you want a numpy array, and size of a is small, use is_numeric_2.

Else, use is_numeric_3

dragon2fly
  • 2,309
  • 19
  • 23
  • 1
    Thanks! The is_numeric_3 function speeds up the computation by ~10% in my test. I am wondering if the is_float function can be written in C, and use Cython to further speed up the computation? – Wei Li Jun 23 '16 at 17:43
  • 1
    http://stackoverflow.com/a/25299619/901925 claims to have a fast `isfloat` function. – hpaulj Jun 23 '16 at 20:46
2
In [23]: x = np.array(['1.2', '2.3', '1.2.3', '1.2', 'foo'])

Trying to convert the whole array to float, results in an error if one or more strings can't be converted:

In [24]: x.astype(float)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-a68fda2cafea> in <module>()
----> 1 x.astype(float)

ValueError: could not convert string to float: '1.2.3'

In [25]: x[:2].astype(float)
Out[25]: array([ 1.2,  2.3])

But to find which ones can be converted, and which can't, we probably have to apply a test to each element. That requires some sort of iteration, and some sort of test.

Most of these answers have wrapped float in a try/except block. But look at How do I check if a string is a number (float) in Python? for alternatives. One answer found that the float wrap was fast for valid inputs, but a regex test was faster for invalid ones (https://stackoverflow.com/a/25299619/901925).

In [30]: def isnumeric(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

In [31]: [isnumeric(s) for s in x]
Out[31]: [True, True, False, True, False]

In [32]: np.array([isnumeric(s) for s in x])  # for array
Out[32]: array([ True,  True, False,  True, False], dtype=bool)

I like list comprehension because it is common and clear (and preferred in Py3). For speed I have found that frompyfunc has a modest advantage over other iterators (and handles multidimensional arrays):

In [34]: np.frompyfunc(isnumeric, 1,1)(x)
Out[34]: array([True, True, False, True, False], dtype=object)

In [35]: np.frompyfunc(isnumeric, 1,1)(x).astype(bool)
Out[35]: array([ True,  True, False,  True, False], dtype=bool)

It requires a bit more boilerplate than vectorize, but is usually faster. But if the array or list is small, list comprehension is usually faster (avoiding numpy overhead).

======================

(edited) np.char has a set of functions that apply string methods to the elements of an array. But the closest function is np.char.isnumeric which just tests for numeric characters, not a full float conversion.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 1
    `isnumeric` does something completely different from testing whether a string can be interpreted as a number. – user2357112 Jun 23 '16 at 20:28
1

I find the following works well for my purpose.

First, save the isNumeric function from https://rosettacode.org/wiki/Determine_if_a_string_is_numeric#C in a file called ctest.h, then create a .pyx file as follows:

from numpy cimport ndarray, uint8_t
import numpy as np
cimport numpy as np

cdef extern from "ctest.h":
     int isNumeric(const char * s)

def is_numeric_elementwise(ndarray x):
    cdef Py_ssize_t i
    cdef ndarray[uint8_t, mode='c', cast=True] y = np.empty_like(x, dtype=np.uint8)

    for i in range(x.size):
        y[i] = isNumeric(x[i])

    return y > 0

The above Cython function runs quite fast.

In [4]: is_numeric_elementwise(array(['1.2', '2.3', '1.2.3']))
Out[4]: array([ True,  True, False], dtype=bool)

In [5]: %timeit is_numeric_elementwise(array(['1.2', '2.3', '1.2.3'] * 1000000))
1 loops, best of 3: 695 ms per loop

Compare with is_numeric_3 method in https://stackoverflow.com/a/37997673/4909242, it is ~5 times faster.

In [6]: %timeit is_numeric_3(array(['1.2', '2.3', '1.2.3'] * 1000000))
1 loops, best of 3: 3.45 s per loop

There might still be some rooms to improve, I guess.

Community
  • 1
  • 1
Wei Li
  • 597
  • 3
  • 5
  • 13
0
# method to check whether a string is a float
def is_numeric(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

# method to return an array of booleans that dictate whether a string can be parsed into a number
def is_numeric_array(arr):
    return_array = []
    for val in numpy.ndenumerate(arr):
        return_array.append(is_numeric(val))
    return return_array
Kairat
  • 790
  • 3
  • 7
0

This also relies on the try-except method of getting the per-element result, but using fromiter pre-allocs the boolean result array:

def is_numeric(x):

    def try_float(xx):
        try:
            float(xx)
        except ValueError:
            return False
        else:
            return True

    return fromiter((try_float(xx) for xx in x.flat),
                    dtype=bool, count=x.size)

x = array(['1.2', '2.3', '1.2.3'])
print is_numeric(x)

Gives:

[ True  True False]
sebastian
  • 9,526
  • 26
  • 54