Getting Subsequent Dates In Different Columns

Question

I have the following dataset of students taking multiple SAT exams:

df = pd.DataFrame({'student': 'A A A A A B B B'.split(),
                  'exam_date':[datetime.datetime(2013,4,1),datetime.datetime(2013,6,1),
                               datetime.datetime(2013,8,1),datetime.datetime(2013,10,2),
                               datetime.datetime(2014,1,1),datetime.datetime(2013,11,2),
                               datetime.datetime(2014,2,2),datetime.datetime(2014,5,2)]})

print(df)
  student  exam_date
0   A     2013-04-01
1   A     2013-06-01
2   A     2013-08-01
3   A     2013-10-02
4   A     2014-01-01
5   B     2013-11-02
6   B     2014-02-02
7   B     2014-05-02

I want to make a dataset of each student with their first exam date, second exam date, and so on.

I am trying groupby and min to get the 1st date, but not sure about the subsequent dates.

# Find earliest time
df.groupby('student')['exam_date'].agg('min').reset_index()

I tried rank to get the desired result, but it seems too much of work.

# Rank
df['rank'] = df.groupby('student')['exam_date'].rank(ascending=True)
print(df)
student exam_date   rank
0   A   2013-04-01  1.0
1   A   2013-06-01  2.0
2   A   2013-08-01  3.0
3   A   2013-10-02  4.0
4   A   2014-01-01  5.0
5   B   2013-11-02  1.0
6   B   2014-02-02  2.0
7   B   2014-05-02  3.0

Is there any better way of getting the desired output? Any suggestions would be appreciated. Thanks!

Desired Output:

 student  exam_01     exam_02     exam_03     exam_04
0   A   2013-04-01  2013-06-01  2013-08-01  2013-10-02
1   B   2013-11-02  2014-02-02  2013-05-02      NA

Question #10 in the [answer](https://stackoverflow.com/a/47152692/15497888) — Henry Ecker, Oct 25 '21 at 04:44

score 4 · Accepted Answer · edited May 30 '22 at 16:02

You can use groupby+cumcount to generate a helper column and pivot.

NB. This assumes the dates are sorted, if not use sort_values first.

(df.assign(id=df.groupby('student').cumcount().add(1))
   .pivot(index='student', columns='id', values='exam_date')
   .add_prefix('exam_')
 )

Output:

id          exam_1     exam_2     exam_3     exam_4     exam_5
student                                                       
A       2013-04-01 2013-06-01 2013-08-01 2013-10-02 2014-01-01
B       2013-11-02 2014-02-02 2014-05-02        NaT        NaT

Getting Subsequent Dates In Different Columns

1 Answers1