How to sort a numpy array with key as isnan?

Question

I have a numpy array like

np.array([[1.0, np.nan, 5.0, 1, True, True, np.nan, True],
       [np.nan, 4.0, 7.0, 2, True, np.nan, False, True],
       [2.0, 5.0, np.nan, 3, False, False, True, np.nan]], dtype=object)

Now I want to sort the values with key as isnan? How can I do that? So that I would end up in the array

np.array([[1.0, 5.0, 1, True, True, True, np.nan, np.nan],
   [4.0, 7.0, 2, True, False, True, np.nan, np.nan],
   [2.0, 5.0, 3, False, False, True, np.nan, np.nan]], dtype=object)

np.sort() didn't work. The same can be achieved in pandas by applying sorted over columns with sorted function with key as pd.isnull(), but looking for a numpy answer for speed.

In pandas

data = pd.DataFrame({'Key': [1, 2, 3], 'Var': [True, True, False], 'ID_1':[1, np.NaN, 2],
                'Var_1': [True, np.NaN, False], 'ID_2': [np.NaN, 4, 5], 'Var_2': [np.NaN, False, True],
                'ID_3': [5, 7, np.NaN], 'Var_3': [True, True, np.NaN]})

data.apply(lambda x : sorted(x,key=pd.isnull),1).values

Output :

array([[1.0, 5.0, 1, True, True, True, nan, nan],
   [4.0, 7.0, 2, True, False, True, nan, nan],
   [2.0, 5.0, 3, False, False, True, nan, nan]], dtype=object)

Just curious how did you end up with an object dtype array, as the no. of elems look to be same per row/list. — Divakar, Sep 20 '17 at 15:30
Where do you want the nan's to end up in the ordering? In their own row/column, or at the end of one? — Edward Minnix, Sep 20 '17 at 15:32
@Bharathshetty. Not a link. Post your actual code that you ran. — Mad Physicist, Sep 20 '17 at 15:42
Im dumb in numpy all I did is `np.sort(data.values)` which dint work. Since the values is a numpy array just to know whether this can be with any numpy function or any numpy based approach. — Bharath M Shetty, Sep 20 '17 at 15:45
Also, why did you not try to use the `axis` parameter to `np.sort`? — Mad Physicist, Sep 20 '17 at 15:52
You might want to edit the sample input array listed in the question, as it differs from `data.values`. — Divakar, Sep 20 '17 at 16:43
They are same na I added np.nan for the sake of easy copy paste of data. — Bharath M Shetty, Sep 20 '17 at 16:44
@Bharathshetty Took the liberty to edit the sample input as `data.values`. Hope that's okay. Feel free to edit if that's not the case. — Divakar, Sep 23 '17 at 14:36

Divakar · Accepted Answer · 2017-09-20T16:18:57.610

5

Approach #1

Here's a vectorized approach borrowing the concept of masking from this post -

def mask_app(a):
    out = np.empty_like(a)
    mask = np.isnan(a.astype(float))
    mask_sorted = np.sort(mask,1)
    out[mask_sorted] = a[mask]
    out[~mask_sorted] = a[~mask]
    return out

Sample run -

# Input dataframe
In [114]: data
Out[114]: 
   ID_1  ID_2  ID_3  Key    Var  Var_1  Var_2 Var_3
0   1.0   NaN   5.0    1   True   True    NaN  True
1   NaN   4.0   7.0    2   True    NaN  False  True
2   2.0   5.0   NaN    3  False  False   True   NaN

# Use pandas approach for verification    
In [115]: data.apply(lambda x : sorted(x,key=pd.isnull),1).values
Out[115]: 
array([[1.0, 5.0, 1, True, True, True, nan, nan],
       [4.0, 7.0, 2, True, False, True, nan, nan],
       [2.0, 5.0, 3, False, False, True, nan, nan]], dtype=object)

# Use proposed approach and verify
In [116]: mask_app(data.values)
Out[116]: 
array([[1.0, 5.0, 1, True, True, True, nan, nan],
       [4.0, 7.0, 2, True, False, True, nan, nan],
       [2.0, 5.0, 3, False, False, True, nan, nan]], dtype=object)

Approach #2

With few more modifications, a simplified version with the idea from this post -

def mask_app2(a):
    out = np.full(a.shape,np.nan,dtype=a.dtype)
    mask = ~np.isnan(a.astype(float))
    out[np.sort(mask,1)[:,::-1]] = a[mask]
    return out

edited Sep 20 '17 at 16:18

answered Sep 20 '17 at 16:03

Divakar

218,885
19
262
358

1

I was waiting for you . :) – Bharath M Shetty Sep 20 '17 at 16:04
2

I would love to give bounty for this solution. This is beautiful. – Bharath M Shetty Sep 20 '17 at 16:08
Sir a small question how long have you been working with vecotrizing a solution. – Bharath M Shetty Sep 20 '17 at 16:21
@Bharathshetty Vectorizing this particular solution, you mean? – Divakar Sep 20 '17 at 16:22
Sir no no no your experience with numpy and vectorization . You can vectorize any kind of for loops :) so – Bharath M Shetty Sep 20 '17 at 16:23
1

@Bharathshetty It's been a while. I started off with MATLAB. Loved vectorizing stuffs on it. Heard about NumPy and jumped on it and it has its own unique/interesting capabilities and have been hooked ever since to it. I get to answer MATLAB questions sometimes too these days. But yeah I generally try to think that I need to avoid loops and that helps I think :) – Divakar Sep 20 '17 at 16:25
1

I should try to vectorize as many for loops as I can. I too don't like loops. Though vectorizing is very very tricky and my dumb brain is not ready for it yet. :) :) – Bharath M Shetty Sep 20 '17 at 16:27

Mad Physicist · Answer 2 · 2017-09-20T16:03:23.073

2

Since you have an object array anyway, do the sorting in Python, then make your array. You can write a key that does something like this:

from math import isnan

def key(x):
    if isnan(x):
        t = 3
        x = 0
    elif isinstance(x, bool):
        t = 2
    else:
        t = 1
    return t, x

This key returns a two-element tuple, where the first element gives the preliminary ordering by type. It considers all NaNs to be equal and greater than any other type.

Even if you start with data in a DataFrame, you can do something like:

values = [list(sorted(row, key=key)) for row in data.values]
values = np.array(values, dtype=np.object)

You can replace the list comprehension with np.apply_along_axis if that suits your needs better:

values = np.apply_along_axis(lambda row: np.array(list(sorted(row, key=key))),
                             axis=1, arr=data.values)

edited Sep 20 '17 at 16:03

answered Sep 20 '17 at 15:51

Mad Physicist

107,652
25
181
264

Can this be done with something like `apply_along_axis`? – Bharath M Shetty Sep 20 '17 at 15:55
@Bharathshetty. You can replace the list comprehension with `apply_along_axis`. I will show an example, but I doubt it will speed things up any. You will still be using the Python `sorted` function and a Python key. – Mad Physicist Sep 20 '17 at 15:59
The problem is that I am not aware of any way to specify a custom key to the numpy machinery. There may be one, but I have looked at this in *a lot* of detail. – Mad Physicist Sep 20 '17 at 16:02

score 0 · Answer 3 · answered Sep 20 '17 at 15:41

You can't do this with an object array and nan You would need to find a numeric type everything would fit into. When used as an object instead of as a float, nan returns false for <, >, and ==.

Additionally, True and False are equivalent to 0 and 1, so I don't think there is any way to get your expected result.

You would have to see if converting the dtype to float would give you proper results for your use case.

How to sort a numpy array with key as isnan?

3 Answers3

Linked