1

I have a dataframe with the following setup.

   a foo  b foo   c foo    Set

 0 first  second  third  fourth

The goal is to merge all columns with foo in their header name and output the following:

   All_foo    Set

0   first    fourth

1   second 

2   third

I tried the following:

df2 = df.unstack().reset_index(drop=True).rename('All_foo').to_frame()

This gives me one single column with all the values merged. How can I make the unstacking process to be based on a condition so that I can get the output above?

JM9
  • 119
  • 1
  • 8

2 Answers2

2

Check what name of columns contains foo with Series.str.contains, then use DataFrame.melt where this condition is not met:

cols_melt = [*df.columns[~df.columns.str.contains('foo')]]
#cols_melt = df.columns[~df.columns.str.contains('foo')].tolist()
new_df = df.melt(cols_melt,value_name = 'All foo')[['All foo']+ cols_melt]
print(new_df)
  All foo     Set
0   first  fourth
1  second  fourth
2   third  fourth

Setting NaN values:

new_df[cols_melt] = df[cols_melt]
print(new_df)
  All foo     Set
0   first  fourth
1  second     NaN
2   third     NaN
ansev
  • 30,322
  • 5
  • 17
  • 31
  • 2
    such answers are fast to write but hard to understand – YOLO Jan 24 '20 at 18:36
  • I think a two-line python code with the documentation is self explanatory. @YOLO – ansev Jan 24 '20 at 18:48
  • 1
    awesome! can you tell me what * and ~ are doing? or if you can point me to some kind of documentation? – JM9 Jan 24 '20 at 18:55
  • 1
    this unpack the columns index , but if you want to learn more about see: https://stackoverflow.com/questions/3394835/use-of-args-and-kwargs/3394898#3394898 and https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters – ansev Jan 24 '20 at 18:57
0

I would use filter to slice out columns having foo assign it to df1. Process df1 to a series by using stack and reset_index. Finally, concat the result back to non-foo columns

df1 = df.filter(like='foo')
df_final = pd.concat([df1.stack().rename('all foo').reset_index(drop=True), 
                      df.drop(df1.columns,1)], axis=1)

Out[1321]:
  all foo     Set
0   first  fourth
1  second     NaN
2   third     NaN
Andy L.
  • 24,909
  • 4
  • 17
  • 29