Turn repeat entries associated with different values into one entry with list of those values?

Question

I wasn't sure how to title this.

Assume the following Pandas DataFrame:

    Student ID      Class   
1   John    99124   Biology
2   John    99124   History
3   John    99124   Geometry
4   Sarah   74323   Physics
5   Sarah   74323   Geography
6   Sarah   74323   Algebra
7   Alex    80045   Trigonometry
8   Alex    80045   Economics
9   Alex    80045   French

I'd like to reduce the number of rows in this DataFrame by creating a list of classes that each student is taking, and then putting that in the "class" column. Here's my desired output:

    Student ID      Class
1   John    99124   ["Biology","History","Geometry"]
2   Sarah   74323   ["Physics","Geography","Algebra"]
3   Alex    80045   ["Trigonometry","Economics","French"]

I am working with a large DataFrame that is not as nicely organized as this example. Any help is appreciated.

I'm not sure a list is really the ideal solution. Storing objects in a DataFrame just creates an obstacle for most future manipulations. For instance, it's no longer trivial to check which students have taken `Biology`, or at least not without a much less performant method. — ALollz, Aug 02 '19 at 19:29
It depends what kind of manipulations you need in the end. Having a long format where each row is a unique key (as in the above) can be useful and is flexible. It's not difficult to `pivot`, or get dummies from there, which might be more suitable for other things. — ALollz, Aug 02 '19 at 19:58

score 2 · Accepted Answer · answered Aug 02 '19 at 19:25

2

You need to groupby on Student and ID and then use agg.

df.groupby(['Student', 'ID'], as_index=False).agg({'Class': list})

Ouput:

  Student     ID                              Class
0    Alex  80045  [Trigonometry, Economics, French]
1    John  99124       [Biology, History, Geometry]
2   Sarah  74323      [Physics, Geography, Algebra]

answered Aug 02 '19 at 19:25

harvpan

8,571
2
18
36

1

Worked for me, thanks for the quick and accurate response. – superfeen Aug 02 '19 at 19:42

score 1 · Answer 2 · answered Aug 02 '19 at 19:25

1

df.groupby('ID')['Class'].apply(list)

answered Aug 02 '19 at 19:25

Carsten

2,765
1
13
28

score 1 · Answer 3 · answered Aug 02 '19 at 19:29

1

let's see, using some help Apply multiple functions to multiple groupby columns

you could write something like

df= df.groupby('student').agg({'id':'max', 'Class': lambda x: x.tolist()})

hope it helps, giulio

answered Aug 02 '19 at 19:29

giulio

157
8

score 1 · Answer 4 · answered Aug 02 '19 at 19:35

1

try like below

df.groupby(['Student', 'ID'],as_index=False).agg(lambda x:','.join('"'+x+'"'))

answered Aug 02 '19 at 19:35

Zaynul Abadin Tuhin

31,407
5
33
63

Turn repeat entries associated with different values into one entry with list of those values?

4 Answers4