I have the following dataset of students taking multiple SAT exams:
df = pd.DataFrame({'student': 'A A A A A B B B'.split(),
'exam_date':[datetime.datetime(2013,4,1),datetime.datetime(2013,6,1),
datetime.datetime(2013,8,1),datetime.datetime(2013,10,2),
datetime.datetime(2014,1,1),datetime.datetime(2013,11,2),
datetime.datetime(2014,2,2),datetime.datetime(2014,5,2)]})
print(df)
student exam_date
0 A 2013-04-01
1 A 2013-06-01
2 A 2013-08-01
3 A 2013-10-02
4 A 2014-01-01
5 B 2013-11-02
6 B 2014-02-02
7 B 2014-05-02
I want to make a dataset of each student with their first exam date, second exam date, and so on.
I am trying groupby
and min
to get the 1st date, but not sure about the subsequent dates.
# Find earliest time
df.groupby('student')['exam_date'].agg('min').reset_index()
I tried rank
to get the desired result, but it seems too much of work.
# Rank
df['rank'] = df.groupby('student')['exam_date'].rank(ascending=True)
print(df)
student exam_date rank
0 A 2013-04-01 1.0
1 A 2013-06-01 2.0
2 A 2013-08-01 3.0
3 A 2013-10-02 4.0
4 A 2014-01-01 5.0
5 B 2013-11-02 1.0
6 B 2014-02-02 2.0
7 B 2014-05-02 3.0
Is there any better way of getting the desired output? Any suggestions would be appreciated. Thanks!
Desired Output:
student exam_01 exam_02 exam_03 exam_04
0 A 2013-04-01 2013-06-01 2013-08-01 2013-10-02
1 B 2013-11-02 2014-02-02 2013-05-02 NA