2

Problem

I imported a table like this in a pandas dataframe:

Trace,Action
1,Create Fine
1,Send Fine
2,Create Fine
2,Payment
3,Create Fine
3,Send Fine
4,Insert Fine Notification
4,Add penalty
4,Payment
4,Payment
5,Create Fine
5,Payment
6,Create Fine
6,Send Fine
7,Insert Fine Notification
7,Add penalty
7,Send for Credit Collection

enter image description here

I am trying to group similar rows together in different lists based on their sequence of actions.

For example here I want to create 4 lists:

  1. List1 should be [1,3,6] because these traces have the same sequence of actions ('Create fine', 'Send fine')
  2. List2 should be [2,5] because these traces have the same sequence of actions ('Create fine', 'Payment')
  3. List3 should be [4] because 4 is the only trace with that sequence of actions
  4. List4 should be [7] because 7 is the only trace with that sequence of actions

Is it possible to do this in pandas?

What I tried

I managed to combine similar rows into one like this:

df.groupby('Trace').agg({'Action':', '.join}).reset_index()

and I get a dataframe like this:

ID  Trace   Action
0   1   Create Fine, Send Fine
1   2   Create Fine, Payment
2   3   Create Fine, Send Fine
3   4   Insert Fine Notification, Add penalty, Payment
4   5   Create Fine, Payment
5   6   Create Fine, Send Fine
6   7   Insert Fine Notification, Add penalty, Send for credit collection

But now how do I group them together in lists?

Ford1892
  • 741
  • 2
  • 9
  • 20
  • Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on how to ask a good question may also be useful. – yatu Mar 18 '20 at 10:35
  • provide data in text not as image, so others can easily copy it – Sociopath Mar 18 '20 at 10:43

1 Answers1

1

I think your solution is first step, then groups by Action column with aggregate lists:

df1 = df.groupby('Trace')['Action'].agg(', '.join).reset_index()

df2 = df1.groupby('Action', sort=False)['Trace'].agg(list).reset_index(name='L')
print (df2)
                                              Action          L
0                             Create Fine, Send Fine  [1, 3, 6]
1                               Create Fine, Payment     [2, 5]
2  Insert Fine Notification, Add penalty, Payment...        [4]
3  Insert Fine Notification, Add penalty, Send fo...        [7]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252