Remove NaNs from Dataframe?

Question

I have the following code:

sample_data = OrderedDict((df.name, df['col'].sample(n=3)) for df in test_cases[1:])
sample = pd.DataFrame(sample_data)

Which gives the following dataframe:

col1   col2
A      NaN
P      NaN
NaN    E
NaN    R
U      NaN
NaN    Y

How do I get the following dataframe:

 col1   col2
 A      E
 P      R
 U      Y

What happens if you have unequal number of not null values per column? — Vaishali, Apr 25 '19 at 18:50
@Vaishali I should always have the same number of populated values per column due to the sample that is taken. — a1234, Apr 25 '19 at 18:53

Bitto · Accepted Answer · 2019-04-25T19:10:01.213

Another possible solution is to use dropna(), reset_index() and concat().

pd.concat([df[x].dropna().reset_index(drop=True) for x in df.columns], axis=1)

Code

import pandas as pd
import numpy as np
li=[['A',np.nan],['P',np.nan],[np.nan,'E'],[np.nan,'R'],['U',np.nan],[np.nan,'Y']]
df=pd.DataFrame(li,columns=['col1','col2'])
df2=pd.concat([df[x].dropna().reset_index(drop=True) for x in df.columns], axis=1)
print(df2)

Output

  col1 col2
0    A    E
1    P    R
2    U    Y

score 2 · Answer 2 · answered Apr 25 '19 at 19:00

You can use list comprehension to find the not null values and reconstruct the dataframe,

pd.DataFrame([df.loc[df[col].notna(), col].values for col in df.columns]).T


    0   1
0   A   E
1   P   R
2   U   Y

Or

a = np.array([df.loc[df[col].notna(), col].values for col in df.columns]).T

pd.DataFrame(a, columns = df.columns)

    col1    col2
0   A       E
1   P       R
2   U       Y

score 1 · Answer 3 · answered Apr 25 '19 at 19:02

1

IIUC

df.apply(lambda x : sorted(x,key=pd.isnull)).dropna()
Out[485]: 
  col1 col2
0    A    E
1    P    R
2    U    Y

If the performance is matter check justify

answered Apr 25 '19 at 19:02

BENY

317,841
20
164
234

Remove NaNs from Dataframe?

3 Answers3