How to resize dataframe by dropping NaN in individual cells?

Question

I have a df that looks like this:

column1   column2    column3   column4
1            2       nan          4
1            2        3           nan
1            2       nan          nan
1            2       nan          nan

How do I reshape the dataframe, for every cell with NaN drop it, and if the column has the same value only take one instance of it?

New df should look like this;

column1   column2    column3   column4
1            2        3          4

I have roughly 500 columns with spotty data like this.

Edit:

I used this line of code to move the values with the spotty data into one row.

df = df.apply(lambda x: pd.Series(x.dropna().values))

new df looks like this:

column1   column2    column3   column4
1            2        3            4
1            2       nan          nan
1            2       nan          nan
1            2       nan          nan

Then I drop the duplicates:

df = df.drop_duplicates()

df looks like this now:

 column1   column2    column3   column4
    1            2        3            4
    1            2       nan          nan

Not sure why the Nan are not dropping after this point with but the rows are dropping:

pivoted_df = pivoted_df.dropna()

score 3 · Accepted Answer · answered Sep 27 '18 at 12:34

3

You can forward fill missing values by ffill, remove NaNs rows and remove duplicates:

df = df.ffill().dropna().drop_duplicates()
print (df)
   column1  column2  column3  column4
1        1        2      3.0      4.0

Or if need first non missing values per groups specified by some column(s):

df = df.groupby(['column1','column2'], as_index=False).first()
print (df)

   column1  column2  column3  column4
0        1        2      3.0      4.0

answered Sep 27 '18 at 12:34

jezrael

822,522
95
1,334
1,252

`df = df.ffill().dropna().drop_duplicates()` is dropping all my rows. I am being left with only columns – RustyShackleford Sep 27 '18 at 12:39
@RustyShackleford - It seems in data is at least one only NaNs columns, so try remove this columns by `df.dropna(how='all', axis=1).dropna().ffill().drop_duplicates()`. – jezrael Sep 27 '18 at 12:42
But hard to answer without [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve), and I guess create it is complicated :( – jezrael Sep 27 '18 at 12:47
@jezrael The question is dupe of (https://stackoverflow.com/questions/43119503/how-to-remove-blanks-nas-from-dataframe-and-shift-the-values-up) – Space Impact Sep 27 '18 at 12:49
@SandeepKadapa - Not sure, if `500` columns is possible use it. – jezrael Sep 27 '18 at 12:52
@RustyShackleford - reason is `nan` are strings, try `df = df.replace('nan', np.nan)` in first step – jezrael Sep 27 '18 at 12:58

How to resize dataframe by dropping NaN in individual cells?

1 Answers1