I have 2 pandas dataframes. One of them contains year information, the id's are always unique in this table:
_id start_date end_date name
0 abdgff374935hgkfeo549353 2018-09-02 00:00:00.000 2019-08-31 00:00:00.000 2018/2019
1 fsjket43538603463u9gregr 2019-09-01 00:00:00.000 2020-07-31 00:00:00.000 2019/2020
My second one contains group information which also has students information in, there are multiple lines with the same group id, as each line is linked to different students, a student id can also be the same on muliple lines as students enter new groups:
_id student_start_date student_end_date student_id
0 fjkgn4783u54 2018-10-01 00:00:00.000 2019-08-31 00:00:00.000 2gbger9tu9834
1 ngregreit495 2019-09-10 00:00:00.000 2020-07-31 00:00:00.000 ghuti43594353
1 ngregreit495 2019-11-05 00:00:00.000 2020-07-31 00:00:00.000 ghuti43594354
1 ngregreit495 2019-09-02 00:00:00.000 2020-07-31 00:00:00.000 ghuti43594357
What I am trying to recreate is the below table which adds the year name to the groups table:
_id student_start_date student_end_date student_id year_name
0 fjkgn4783u54 2018-10-01 00:00:00.000 2019-08-31 00:00:00.000 2gbger9tu9834 2018/2019
1 ngregreit495 2019-09-10 00:00:00.000 2020-07-31 00:00:00.000 ghuti43594353 2019/2020
Does anyone know how I can loop though both the groups table and the years table to find which groups data fits into which year?
This is what I have so far, but it's very slow:
for i in Years.index:
StartDate = Years.loc[i,'start_date']
EndDate = Years.loc[i,'end_date']
YearName = Years.loc[i,'name']
print(YearName)
for i in groups.index:
StudentStartDate = groups.loc[i,'student_start_date']
StudentEndDate = groups.loc[i,'student_end_date']
GroupName = groups.loc[i,'name']
try:
if (StartDate <= StudentStartDate <= EndDate):
groups.loc[i,'YearName'] = YearName
except TypeError:
pass
try:
if (StartDate <= StudentEndDate <= EndDate):
groups.loc[i,'YearName'] = YearName
except TypeError:
pass