2

My question is similar to this one, however I have many (above 10) columns. One answer says:

if you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...), see here. The .agg() function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}

again specifying these so many columns isn't easy. My own solution is using merge, however I didn't see this simple solution in any related question. So, I thought maybe I am missing something.

Is this solution correct with no problem?

df = df.merge(df.groupby(['prefix','input_text'],
                          as_index=False)['target'].agg('<br />'.join))
Ahmad
  • 8,811
  • 11
  • 76
  • 141
  • 1
    It is not unreasonable to merge after aggregating... It would be helpful to see what data you're starting with and what output you're trying to get for futher clarity. With your shown code it would seem that `df['new col'] = df.groupby(['prefix','input_text'])['target'].transform('
    '.join)` would work fine (?)
    – Henry Ecker Oct 23 '21 at 19:09
  • @HenryEcker, now I have other solutions with post processing, but you mean if in the original one I omit, `as_index=False`, then other columns are included? – Ahmad Oct 24 '21 at 15:29
  • 1
    I don't understand what you're saying. `transform` will produce a like indexed Series. which can be assigned back like `df['new col'] = results`. You're just adding a new column to the DataFrame. Not creating a new DataFrame like `groupby agg`. – Henry Ecker Oct 24 '21 at 15:37
  • @HenryEcker, thanks, now I got what you mean! I think it's better you add it as an answer for future readers. – Ahmad Oct 24 '21 at 16:54
  • @HenryEcker I didn't try it actually, but by groupby the size of dataframe will shrinks, then are you sure, this new columns can be added back to original dataframe? – Ahmad Oct 24 '21 at 16:58
  • `transform` always produces a like indexed series. It is designed to be added back to the DataFrame. You can check out the [docs](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html) – Henry Ecker Oct 24 '21 at 16:59

0 Answers0