1

I have data in excel sheet that I am reading into a dataframe:

ID Grade Course Q1 Number Q1 Letter Q2 Number Q2 Letter
1 9 English 73 B 69 C
1 9 Math 70 B 52 C
1 9 Science 69 C 80 A

desired output:

ID Grade Course Semester Number Grade Letter Grade
1 9 English Q1 73 B
1 9 English Q2 69 C
1 9 Math Q1 70 B
1 9 Math Q2 52 C
1 9 Science Q1 69 C
1 9 Science Q2 80 A

I'm trying to do df.melt, but it's not working. Any help is appreciated.

  • 2
    check out ```pd.melt``` --> ```df.melt(id_vars=['ID','Grade','Course'],var_name='Semester',value_name='Grade') ``` – sophocles Jul 20 '22 at 16:32
  • See also https://stackoverflow.com/questions/27764378/how-to-reverse-a-2-dimensional-table-dataframe-into-a-1-dimensional-list-using – Neo Jul 20 '22 at 16:35
  • @sophocles Sorry, my question was a bit unclear. please see the edited one. How can we do it for multiple value_names? – user19435923 Jul 21 '22 at 17:22
  • Use [`pd.wide_to_long`](https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html#pandas-wide-to-long) (great function for simultaneous "melting" situations). – Scott Boston Jul 21 '22 at 17:27

1 Answers1

2

Update:

df = pd.read_excel('Downloads/grades_mock+data.xlsx')

dfm = df.set_index(['ID', 'GRADE', 'COURSE'])\
        .rename(columns=lambda x: ' '.join(x.split(' ', 1)[::-1]))\
        .reset_index()

#Eliminating duplicates.
dfm = dfm.groupby(['ID', 'GRADE', 'COURSE', 'DISCIPLINE COURSE'], as_index=False).first()

df_out = pd.wide_to_long(dfm,
                         ['GRADE NUMERIC', 'GRADE LETTER'], 
                         ['ID', 'GRADE', 'COURSE', 'DISCIPLINE COURSE'],
                         'Semester', sep=' ', suffix='.*')\
           .reset_index()

print(df_out)

Try this, using pd.wide_to_long, with some column renaming to make it easier:

df = pd.read_clipboard()


dfm = df.set_index(['ID', 'Grade', 'Course'])\
        .rename(columns=lambda x: ' '.join(x.split(' ')[::-1]))\
        .reset_index()

df_out = pd.wide_to_long(dfm,
                         ['Number', 'Letter'], 
                         ['ID', 'Grade', 'Course'],
                         'Semester', sep=' ', suffix='.*')\
           .reset_index()

print(df_out)

Output:

   ID  Grade   Course Semester  Number Letter
0   1      9  English       Q1      73      B
1   1      9  English       Q2      69      C
2   1      9     Math       Q1      70      B
3   1      9     Math       Q2      52      C
4   1      9  Science       Q1      69      C
5   1      9  Science       Q2      80      A
Scott Boston
  • 147,308
  • 15
  • 139
  • 187