1

I have a dataframe with rows containing 2-3 values, spread over 4 columns. How do I reshape the dataframe, so that I only have 3 columns with values.

Example dataframe:

d = {'col1': [1, 2,np.nan,4],
     'col2': ['a',np.nan,'d', 'f'],
     'col3': [np.nan,'x',np.nan,'v'],
     'col4': ['q','w','e',np.nan]}

df = pd.DataFrame(data=d)

   col1 col2 col3 col4
0   1.0    a  NaN    q
1   2.0  NaN    x    w
2   NaN    d  NaN    e
3   4.0    f    v  NaN

What I want as output:

d2 = {'col1': [1, 2,'d',4],
      'col2': ['a','x','e', 'f'],
      'col3':['q','w',np.nan,'v']}

df2 = pd.DataFrame(data=d2)

  col1 col2 col3
0  1.0    a    q
1  2.0    x    w
2    d    e  NaN
3  4.0    f    v
mozway
  • 194,879
  • 13
  • 39
  • 75
fkg
  • 13
  • 2
  • Does this answer your question? [Python Pandas replace NaN in one column with value from corresponding row of second column](https://stackoverflow.com/questions/29177498/python-pandas-replace-nan-in-one-column-with-value-from-corresponding-row-of-sec) – Grekkq Apr 21 '22 at 13:24
  • @Grekkq I don't think this is what is expected – mozway Apr 21 '22 at 13:28

1 Answers1

2

You can apply dropna and reset the index per row, finally rename the columns to original values:

d = dict(enumerate(df.columns))
df2 = (df.apply(lambda s: s.dropna()
                           .reset_index(drop=True),
                axis=1)
         .rename(columns=d)
      )

output:

  col1 col2 col3
0  1.0    a    q
1  2.0    x    w
2    d    e  NaN
3  4.0    f    v
mozway
  • 194,879
  • 13
  • 39
  • 75