-2

I have a pandas dataframe that looks something like this:

   Column1 Column2 Column3
 0   1  NaN  NaN
 1   4  NaN  NaN
 2  NaN  3   NaN
 3  NaN  98  NaN
 4  NaN NaN  562
 5  NaN NaN  742
 .
 .
 .

How would I go about removing all of the unnecessary NaNs and make it look like this

   Column1 Column2 Column3
0    1   3   562
1    4   98  742
.
.
.
Sheldon
  • 4,084
  • 3
  • 20
  • 41
M. Chak
  • 530
  • 3
  • 13
  • What is the issue, exactly? Have you tried anything, done any research? Where does that data even come from, is there no way of fixing it beforehand? – AMC Apr 07 '20 at 19:00

3 Answers3

2

Run:

df.apply(lambda col: col.dropna().reset_index(drop=True).astype(int))

Just apply to each column a function, which drops NaN values in this column. Due to presence of NaN values column are generally of float type, but I attempt to cast them to int.

Note also that other solutions work only as long as each column contains equal number of non-NaN values.

To check it, add the following row:

6  NaN   NaN   999

to your 6 initial rows, so that now Column3 contains 3 non-Nan values, whereas other columns - only 2.

Solution by yatu drops this last row, whereas solution by Quang results in ValueError: arrays must all be same length.

But my solution works OK also in this case, leaving trailing NaN in "too short" columns.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
1

You can just dropna:

df.apply(lambda x: x.dropna().values)

Output:

   Column1  Column2  Column3
0      1.0      3.0    562.0
1      4.0     98.0    742.0
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

We can use justify here from the linked post:

pd.DataFrame(justify(df.values, invalid_val=np.nan, side='up', axis=0), 
             columns=df.columns).dropna()

  Row1  Row2   Row3
0   1.0   3.0  562.0
1   4.0  98.0  742.0
yatu
  • 86,083
  • 12
  • 84
  • 139