I have a non normalised.db file and I need to create a dataframe df_exams from the column 'Exams' of the non-normalised db file. Now the column exams of the non-normalised db file looks like this:
The original non-normalised file has columns of Student ID and Exams like this:
Student ID | Exams |
---|---|
1 | exam7 (2017), exam9 (2018), exam3 (2018),... |
2 | exam2(2017), exam2(2017), exam8 (2018),... |
3 | exam7 (2017), exam9 (2018), exam3 (2018),... |
And I need it like
Student ID | Exam | Year |
---|---|---|
1 | exam7 | 2017 |
1 | exam9 | 2018 |
1 | exam3 | 2018 |
and so on. I am fairly new to python and appreciate the help.
I had written this code:
df_exams[['Exams','Year']]= df_exams.Exams.str.extract('(.)\s\((.\d+)', expand=True)
This does not produce the desired output.