Generate New DataFrame without NaN Values

Question

I've the following Dataframe:

     a    b    c    d    e
0  NaN  2.0  NaN  4.0  5.0
1  NaN  2.0  3.0  NaN  5.0
2  1.0  NaN  3.0  4.0  NaN
3  1.0  2.0  NaN  4.0  NaN
4  NaN  2.0  NaN  4.0  5.0

What I try to to is to generate a new Dataframe without the NaN values. There are always the same number of NaN Values in a row.

The final Dataframe should look like this:

Does someone know an easy way to do this? Any help is appreciated.

Space Impact · Answer 1 · 2018-08-31T11:09:06.407

Using array indexing:

pd.DataFrame(df.values[df.notnull().values].reshape(df.shape[0],3),
             columns=list('xyz'),dtype=int)

    x   y   z
0   2   4   5
1   2   3   5
2   1   3   4
3   1   2   4
4   2   4   5

If the dataframe has more inconsistance values across rows like 1st row with 4 values and from 2nd row if it has 3 values, Then this will do:

    a   b   c   d   e   g
0   NaN 2.0 NaN 4.0 5.0 6.0
1   NaN 2.0 3.0 NaN 5.0 NaN
2   1.0 NaN 3.0 4.0 NaN NaN
3   1.0 2.0 NaN 4.0 NaN NaN
4   NaN 2.0 NaN 4.0 5.0 NaN

pd.DataFrame(df.apply(lambda x: x.values[x.notnull()],axis=1).tolist())

    0   1   2   3
0   2.0 4.0 5.0 6.0
1   2.0 3.0 5.0 NaN
2   1.0 3.0 4.0 NaN
3   1.0 2.0 4.0 NaN
4   2.0 4.0 5.0 NaN

Here we cannot remove NaN's in last column.

score 1 · Answer 2 · answered Aug 31 '18 at 10:36

Use justify function and select first 3 columns:

df = pd.DataFrame(justify(df.values,invalid_val=np.nan)[:, :3].astype(int),
                  columns=list('xyz'), 
                  index=df.index)
print (df)
   x  y  z
0  2  4  5
1  2  3  5
2  1  3  4
3  1  2  4
4  2  4  5

score 0 · Answer 3 · answered Aug 31 '18 at 10:48

If, as in your example, values increase across columns, you can sort over axis=1:

res = pd.DataFrame(np.sort(df.values, 1)[:, :3],
                   columns=list('xyz'), dtype=int)

print(res)

   x  y  z
0  2  4  5
1  2  3  5
2  1  3  4
3  1  2  4
4  2  4  5

score 0 · Answer 4 · edited Nov 17 '21 at 23:16

You can use panda's method for dataframe df.fillna()

This method is used for converting the NaN or NA to your given parameter.

df.fillna(param to replace Nan)

import numpy as np
import pandas as pd
    
    data = {
         'A':[np.nan,  2.0,  np.nan,  4.0,  5.0],
         'B':[np.nan,  2.0,  3.0,  np.nan,  5.0],
         'C':[1.0 , np.nan,  3.0,  4.0,  np.nan],
         'D':[1.0 , 2.0,  np.nan,  4.0,  np.nan,],
         'E':[np.nan,  2.0,  np.nan,  4.0,  5.0]
        }
    df = pd.DataFrame(data)
    print(df)

    
         A    B    C    D    E
    0  NaN  NaN  1.0  1.0  NaN
    1  2.0  2.0  NaN  2.0  2.0
    2  NaN  3.0  3.0  NaN  NaN
    3  4.0  NaN  4.0  4.0  4.0
    4  5.0  5.0  NaN  NaN  5.0
    
    df = df.fillna(0) # Applying the method with parameter 0
    print(df)

         A    B    C    D    E
    0  0.0  0.0  1.0  1.0  0.0
    1  2.0  2.0  0.0  2.0  2.0
    2  0.0  3.0  3.0  0.0  0.0
    3  4.0  0.0  4.0  4.0  4.0
    4  5.0  5.0  0.0  0.0  5.0

If you want to apply this method to the particular column, the syntax would be like this

df[column_name] = df[column_name].fillna(param)

df['A'] = df['A'].fillna(0)
print(df)

     A    B    C    D    E
0  0.0  NaN  1.0  1.0  NaN
1  2.0  2.0  NaN  2.0  2.0
2  0.0  3.0  3.0  NaN  NaN
3  4.0  NaN  4.0  4.0  4.0
4  5.0  5.0  NaN  NaN  5.0

You can also use Python's replace() method to replace np.nan

df = df.replace(np.nan,0)
print(df)

 A    B    C    D    E
0  0.0  0.0  1.0  1.0  0.0
1  2.0  2.0  0.0  2.0  2.0
2  0.0  3.0  3.0  0.0  0.0
3  4.0  0.0  4.0  4.0  4.0
4  5.0  5.0  0.0  0.0  5.0

df['A'] = df['A'].replace() # Replacing only column A

print(df)

A    B    C    D    E
0  0.0  NaN  1.0  1.0  NaN
1  2.0  2.0  NaN  2.0  2.0
2  0.0  3.0  3.0  NaN  NaN
3  4.0  NaN  4.0  4.0  4.0
4  5.0  5.0  NaN  NaN  5.0

Generate New DataFrame without NaN Values

4 Answers4

Linked