2

Similar questions have been asked i.e Concatenate strings from multiple rows using Pandas groupby and remove duplicates from the comma separated cell

I would like to concatenate string values in a pandas groupby lambda function, however whilst maintaining the order of the strings, solutions use the set function which does not preserve the order when multiple values are passed in.

df = df.sort_values(
        ['id', 'order_column']
    ).groupby('id').agg(
        {
            'channel': lambda x: ' > '.join(set(x)),
            'value': np.sum
        }
    )

How do I do this whilst maintaining the order of values passed to the sequence? In my example, the data is

1             Email
2         Affiliate
3    Organic Search
4             Email
5    Branded Social
6            Direct
7    Branded Social
8            Direct
9    Branded Social
10            Email
11        Affiliate
12            Email
13           Direct
14            Email
15           Direct
16            Email
17   Branded Social
18           Direct
19   Branded Social

What I get: 'Affiliate > Email > Organic Search > Branded Social > Direct'

What I expect 'Email > Affiliate > Organic Search > Branded Social > Direct'

ZeroStack
  • 1,049
  • 1
  • 13
  • 25

1 Answers1

3

Use the sort=False parameter in groupby and drop_duplicates instead set:

df = df.sort_values(
        ['id', 'order_column']
    ).groupby('id', sort=False).agg(
        {
            'channel': lambda x: ' > '.join(x.drop_duplicates()),
            'value': np.sum
        }
    )
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
U13-Forward
  • 69,221
  • 14
  • 89
  • 114