1

How to sort the pandas dataframe sort all the column values in a row in ascending order and keep the NaN values at first

Example Data Frame is

                       2018-07-01  2018-07-02  2018-07-03  2018-07-04  
cell_name                                                                                                                                            
1002_NUc_Marathalli_7        0.734       0.550       NaN         0.481             
1002_NUc_Marathalli_8        1.338       1.220       0.911       0.601       
1002_NUc_Marathalli_9        0.330       1.180       0.754       0.631       
1003_IU2_Munnekolalu_7       0.628       0.479       0.988       0.694       
1003_IU2_Munnekolalu_8       5.327       6.831       8.387       9.428       

The output should be in

1002_NUc_Marathalli_7 NaN 0.481 0.550 0.734

I can create another dataframe using:

df1 = pd.DataFrame(np.sort(df.values,axis=1), index=df.index, columns=df.columns)

But here the nan values are coming at the end of the row. I want the nan values at the first

Bucket
  • 7,415
  • 9
  • 35
  • 45
sriman narayana
  • 359
  • 2
  • 20

2 Answers2

1

You can use fillna(float('-inf')) and replace:

pd.DataFrame(
    np.sort(df.fillna(float('-inf')).values, axis=1), 
    index=df.index, 
    columns=df.columns
).replace(float('-inf'), np.nan)
user3471881
  • 2,614
  • 3
  • 18
  • 34
1

You can sort the dataframe rows using a key argument to keep NaNs first:

l = df.apply(sorted, key = lambda s: (~np.isnan(s), s), axis = 1)
pd.DataFrame(l.values.tolist(), columns=df.columns)

      2018-07-01  2018-07-02  2018-07-03  2018-07-04
0         NaN       0.481       0.550       0.734
1       0.601       0.911       1.220       1.338
2       0.330       0.631       0.754       1.180
3       0.479       0.628       0.694       0.988
4       5.327       6.831       8.387       9.428

Explanation

To see the way sorted is working in this case, lets take for example:

l = [0.734, 0.481, np.nan, 0.550]

Sorted accepts a key argument, which can be used to define a function with which the data collection can be transformed before the actual sorting takes place.

So what would make sense in this case? Ideally to have some criteria under which we can identify if there is a np.nan in the sequence, and make those cases come first. How can the data be transformed so that later on sorted can achieve this?

What can be done is add an additional field, for each item in the list, that will also be taken into account to sort the list. Say we instead have:

lt = [(1, 0.734), (1, 0.481), (0, np.nan), (1, 0.550)]

So essentially an identifier of whether on not each element is a NaN. Now when we do:

sorted(lt)
[(0, nan), (1, 0.481), (1, 0.55), (1, 0.734)]

What sorted has done hear is to sort using both items in each tuple, so prioritising the first element (hence tuples starting with a 0 come first), and for the rest it sorted by the second term as the first is 1. So what could be a way to achieve so?

If the lambda expression above is rewritten as a list comprehension, it would be equivalent to:

sl = [(~np.isnan(s), s) for s in l]
print(sl)
[(True, 0.734), (True, 0.481), (False, nan), (True, 0.55)]

Note that tese boolean values are enough, as they are interpreted as True==1 and False==0, which will give the desired order in this case. If we do:

sorted(sl)
[(False, nan), (True, 0.481), (True, 0.55), (True, 0.734)].

This can be implemented in the key argument as a lambda or anonymous function, as:

sorted(l, key = lambda s: (~np.isnan(s), s))

Which for the example above, would give:

[nan, 0.481, 0.55, 0.734]
yatu
  • 86,083
  • 12
  • 84
  • 139