2

I've the following Dataframe:

     a    b    c    d    e
0  NaN  2.0  NaN  4.0  5.0
1  NaN  2.0  3.0  NaN  5.0
2  1.0  NaN  3.0  4.0  NaN
3  1.0  2.0  NaN  4.0  NaN
4  NaN  2.0  NaN  4.0  5.0

What I try to to is to generate a new Dataframe without the NaN values. There are always the same number of NaN Values in a row.

The final Dataframe should look like this:

   x  y  z
0  2  4  5
1  2  3  5
2  1  3  4
3  1  2  4
4  2  4  5

Does someone know an easy way to do this? Any help is appreciated.

j. DOE
  • 238
  • 1
  • 2
  • 15

4 Answers4

3

Using array indexing:

pd.DataFrame(df.values[df.notnull().values].reshape(df.shape[0],3),
             columns=list('xyz'),dtype=int)

    x   y   z
0   2   4   5
1   2   3   5
2   1   3   4
3   1   2   4
4   2   4   5

If the dataframe has more inconsistance values across rows like 1st row with 4 values and from 2nd row if it has 3 values, Then this will do:

    a   b   c   d   e   g
0   NaN 2.0 NaN 4.0 5.0 6.0
1   NaN 2.0 3.0 NaN 5.0 NaN
2   1.0 NaN 3.0 4.0 NaN NaN
3   1.0 2.0 NaN 4.0 NaN NaN
4   NaN 2.0 NaN 4.0 5.0 NaN

pd.DataFrame(df.apply(lambda x: x.values[x.notnull()],axis=1).tolist())

    0   1   2   3
0   2.0 4.0 5.0 6.0
1   2.0 3.0 5.0 NaN
2   1.0 3.0 4.0 NaN
3   1.0 2.0 4.0 NaN
4   2.0 4.0 5.0 NaN

Here we cannot remove NaN's in last column.

Space Impact
  • 13,085
  • 23
  • 48
1

Use justify function and select first 3 columns:

df = pd.DataFrame(justify(df.values,invalid_val=np.nan)[:, :3].astype(int),
                  columns=list('xyz'), 
                  index=df.index)
print (df)
   x  y  z
0  2  4  5
1  2  3  5
2  1  3  4
3  1  2  4
4  2  4  5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

If, as in your example, values increase across columns, you can sort over axis=1:

res = pd.DataFrame(np.sort(df.values, 1)[:, :3],
                   columns=list('xyz'), dtype=int)

print(res)

   x  y  z
0  2  4  5
1  2  3  5
2  1  3  4
3  1  2  4
4  2  4  5
jpp
  • 159,742
  • 34
  • 281
  • 339
0

You can use panda's method for dataframe df.fillna()

This method is used for converting the NaN or NA to your given parameter.

df.fillna(param to replace Nan)

import numpy as np
import pandas as pd
    
    data = {
         'A':[np.nan,  2.0,  np.nan,  4.0,  5.0],
         'B':[np.nan,  2.0,  3.0,  np.nan,  5.0],
         'C':[1.0 , np.nan,  3.0,  4.0,  np.nan],
         'D':[1.0 , 2.0,  np.nan,  4.0,  np.nan,],
         'E':[np.nan,  2.0,  np.nan,  4.0,  5.0]
        }
    df = pd.DataFrame(data)
    print(df)

    
         A    B    C    D    E
    0  NaN  NaN  1.0  1.0  NaN
    1  2.0  2.0  NaN  2.0  2.0
    2  NaN  3.0  3.0  NaN  NaN
    3  4.0  NaN  4.0  4.0  4.0
    4  5.0  5.0  NaN  NaN  5.0
    
    df = df.fillna(0) # Applying the method with parameter 0
    print(df)

         A    B    C    D    E
    0  0.0  0.0  1.0  1.0  0.0
    1  2.0  2.0  0.0  2.0  2.0
    2  0.0  3.0  3.0  0.0  0.0
    3  4.0  0.0  4.0  4.0  4.0
    4  5.0  5.0  0.0  0.0  5.0

If you want to apply this method to the particular column, the syntax would be like this

df[column_name] = df[column_name].fillna(param)

df['A'] = df['A'].fillna(0)
print(df)

     A    B    C    D    E
0  0.0  NaN  1.0  1.0  NaN
1  2.0  2.0  NaN  2.0  2.0
2  0.0  3.0  3.0  NaN  NaN
3  4.0  NaN  4.0  4.0  4.0
4  5.0  5.0  NaN  NaN  5.0

You can also use Python's replace() method to replace np.nan

df = df.replace(np.nan,0)
print(df)

 A    B    C    D    E
0  0.0  0.0  1.0  1.0  0.0
1  2.0  2.0  0.0  2.0  2.0
2  0.0  3.0  3.0  0.0  0.0
3  4.0  0.0  4.0  4.0  4.0
4  5.0  5.0  0.0  0.0  5.0

df['A'] = df['A'].replace() # Replacing only column A

print(df)

A    B    C    D    E
0  0.0  NaN  1.0  1.0  NaN
1  2.0  2.0  NaN  2.0  2.0
2  0.0  3.0  3.0  NaN  NaN
3  4.0  NaN  4.0  4.0  4.0
4  5.0  5.0  NaN  NaN  5.0
Tonechas
  • 13,398
  • 16
  • 46
  • 80