2

I have a df that looks like this:

column1   column2    column3   column4
1            2       nan          4
1            2        3           nan
1            2       nan          nan
1            2       nan          nan

How do I reshape the dataframe, for every cell with NaN drop it, and if the column has the same value only take one instance of it?

New df should look like this;

column1   column2    column3   column4
1            2        3          4

I have roughly 500 columns with spotty data like this.

Edit:

I used this line of code to move the values with the spotty data into one row.

df = df.apply(lambda x: pd.Series(x.dropna().values))

new df looks like this:

column1   column2    column3   column4
1            2        3            4
1            2       nan          nan
1            2       nan          nan
1            2       nan          nan

Then I drop the duplicates:

df = df.drop_duplicates()

df looks like this now:

 column1   column2    column3   column4
    1            2        3            4
    1            2       nan          nan

Not sure why the Nan are not dropping after this point with but the rows are dropping:

pivoted_df = pivoted_df.dropna()
RustyShackleford
  • 3,462
  • 9
  • 40
  • 81

1 Answers1

3

You can forward fill missing values by ffill, remove NaNs rows and remove duplicates:

df = df.ffill().dropna().drop_duplicates()
print (df)
   column1  column2  column3  column4
1        1        2      3.0      4.0

Or if need first non missing values per groups specified by some column(s):

df = df.groupby(['column1','column2'], as_index=False).first()
print (df)

   column1  column2  column3  column4
0        1        2      3.0      4.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • `df = df.ffill().dropna().drop_duplicates()` is dropping all my rows. I am being left with only columns – RustyShackleford Sep 27 '18 at 12:39
  • @RustyShackleford - It seems in data is at least one only NaNs columns, so try remove this columns by `df.dropna(how='all', axis=1).dropna().ffill().drop_duplicates()`. – jezrael Sep 27 '18 at 12:42
  • But hard to answer without [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve), and I guess create it is complicated :( – jezrael Sep 27 '18 at 12:47
  • @jezrael The question is dupe of (https://stackoverflow.com/questions/43119503/how-to-remove-blanks-nas-from-dataframe-and-shift-the-values-up) – Space Impact Sep 27 '18 at 12:49
  • @SandeepKadapa - Not sure, if `500` columns is possible use it. – jezrael Sep 27 '18 at 12:52
  • @RustyShackleford - reason is `nan` are strings, try `df = df.replace('nan', np.nan)` in first step – jezrael Sep 27 '18 at 12:58