Change columns to rows per student ID

Question

I have data in excel sheet that I am reading into a dataframe:

ID	Grade	Course	Q1 Number	Q1 Letter	Q2 Number	Q2 Letter
1	9	English	73	B	69	C
1	9	Math	70	B	52	C
1	9	Science	69	C	80	A

desired output:

ID	Grade	Course	Semester	Number Grade	Letter Grade
1	9	English	Q1	73	B
1	9	English	Q2	69	C
1	9	Math	Q1	70	B
1	9	Math	Q2	52	C
1	9	Science	Q1	69	C
1	9	Science	Q2	80	A

I'm trying to do df.melt, but it's not working. Any help is appreciated.

check out ```pd.melt``` --> ```df.melt(id_vars=['ID','Grade','Course'],var_name='Semester',value_name='Grade') ``` — sophocles, Jul 20 '22 at 16:32
See also https://stackoverflow.com/questions/27764378/how-to-reverse-a-2-dimensional-table-dataframe-into-a-1-dimensional-list-using — Neo, Jul 20 '22 at 16:35
@sophocles Sorry, my question was a bit unclear. please see the edited one. How can we do it for multiple value_names? — user19435923, Jul 21 '22 at 17:22
Use [`pd.wide_to_long`](https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html#pandas-wide-to-long) (great function for simultaneous "melting" situations). — Scott Boston, Jul 21 '22 at 17:27

Scott Boston · Answer 1 · 2022-07-22T20:16:41.840

2

Update:

df = pd.read_excel('Downloads/grades_mock+data.xlsx')

dfm = df.set_index(['ID', 'GRADE', 'COURSE'])\
        .rename(columns=lambda x: ' '.join(x.split(' ', 1)[::-1]))\
        .reset_index()

#Eliminating duplicates.
dfm = dfm.groupby(['ID', 'GRADE', 'COURSE', 'DISCIPLINE COURSE'], as_index=False).first()

df_out = pd.wide_to_long(dfm,
                         ['GRADE NUMERIC', 'GRADE LETTER'], 
                         ['ID', 'GRADE', 'COURSE', 'DISCIPLINE COURSE'],
                         'Semester', sep=' ', suffix='.*')\
           .reset_index()

print(df_out)

Try this, using pd.wide_to_long, with some column renaming to make it easier:

df = pd.read_clipboard()


dfm = df.set_index(['ID', 'Grade', 'Course'])\
        .rename(columns=lambda x: ' '.join(x.split(' ')[::-1]))\
        .reset_index()

df_out = pd.wide_to_long(dfm,
                         ['Number', 'Letter'], 
                         ['ID', 'Grade', 'Course'],
                         'Semester', sep=' ', suffix='.*')\
           .reset_index()

print(df_out)

Output:

   ID  Grade   Course Semester  Number Letter
0   1      9  English       Q1      73      B
1   1      9  English       Q2      69      C
2   1      9     Math       Q1      70      B
3   1      9     Math       Q2      52      C
4   1      9  Science       Q1      69      C
5   1      9  Science       Q2      80      A

edited Jul 22 '22 at 20:16

answered Jul 21 '22 at 17:34

Scott Boston

147,308
15
139
187

It gives the following error:ValueError: the id variables need to uniquely identify each row – user19435923 Jul 21 '22 at 17:51
That means that you can have on one row with ID, Grade and Course. Do you have some duplicates in your data for ID, Grade and Course? – Scott Boston Jul 21 '22 at 20:36
If so, you can aggregate them, use first or last record, or pick another column that makes it unique per row. – Scott Boston Jul 21 '22 at 20:38
There is no other column in the dataset that can make the combination unique. What do you mean by aggregating the columns? @Scott Boston – user19435923 Jul 21 '22 at 21:37
Tried to create a unique id column and used the same code. That didn't work as well – user19435923 Jul 21 '22 at 21:48
Is there a place you can share the data? Or can you create mock data that duplicates your error? – Scott Boston Jul 22 '22 at 00:07
1

I have messaged you on linkedin. thank you @Scott Boston – user19435923 Jul 22 '22 at 16:48
This worked! Thank you so much. Unfortunately, Stackoverflow is not giving me permission to upvote your solution due to my account status – user19435923 Jul 22 '22 at 23:27

Change columns to rows per student ID

1 Answers1