how to unstack a dataframe based on a condition?

Question

I have a dataframe with the following setup.

   a foo  b foo   c foo    Set

 0 first  second  third  fourth

The goal is to merge all columns with foo in their header name and output the following:

   All_foo    Set

0   first    fourth

1   second 

2   third

I tried the following:

df2 = df.unstack().reset_index(drop=True).rename('All_foo').to_frame()

This gives me one single column with all the values merged. How can I make the unstacking process to be based on a condition so that I can get the output above?

`a foo`, `b foo` are they columns and `first` , `second` etc data? — anky, Jan 24 '20 at 18:27
ok, then why does only the first one gets `Set` as `fourth`? — anky, Jan 24 '20 at 18:44

ansev · Accepted Answer · 2020-01-24T18:54:02.070

2

Check what name of columns contains foo with Series.str.contains, then use DataFrame.melt where this condition is not met:

cols_melt = [*df.columns[~df.columns.str.contains('foo')]]
#cols_melt = df.columns[~df.columns.str.contains('foo')].tolist()
new_df = df.melt(cols_melt,value_name = 'All foo')[['All foo']+ cols_melt]
print(new_df)
  All foo     Set
0   first  fourth
1  second  fourth
2   third  fourth

Setting NaN values:

new_df[cols_melt] = df[cols_melt]
print(new_df)
  All foo     Set
0   first  fourth
1  second     NaN
2   third     NaN

edited Jan 24 '20 at 18:54

answered Jan 24 '20 at 18:36

ansev

30,322
5
17
31

2

such answers are fast to write but hard to understand – YOLO Jan 24 '20 at 18:36
I think a two-line python code with the documentation is self explanatory. @YOLO – ansev Jan 24 '20 at 18:48
1

awesome! can you tell me what * and ~ are doing? or if you can point me to some kind of documentation? – JM9 Jan 24 '20 at 18:55
1

this unpack the columns index , but if you want to learn more about see: https://stackoverflow.com/questions/3394835/use-of-args-and-kwargs/3394898#3394898 and https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters – ansev Jan 24 '20 at 18:57

score 0 · Answer 2 · answered Jan 24 '20 at 19:20

I would use filter to slice out columns having foo assign it to df1. Process df1 to a series by using stack and reset_index. Finally, concat the result back to non-foo columns

df1 = df.filter(like='foo')
df_final = pd.concat([df1.stack().rename('all foo').reset_index(drop=True), 
                      df.drop(df1.columns,1)], axis=1)

Out[1321]:
  all foo     Set
0   first  fourth
1  second     NaN
2   third     NaN

how to unstack a dataframe based on a condition?

2 Answers2