2

Given an excel file data.xlsx as follows:

enter image description here

I have read it with df = pd.read_excel('data.xlsx', header = [0, 1], index_col = [0, 1], sheet_name = 'Sheet1'),

Out:

district  2018        2019      
         price ratio price ratio
bj cy       12  0.01     6  0.02
sh hp        4  0.02     3  0.05

I wonder if it's possible to transform it to the following format? Thank you for your help.

enter image description here

ah bon
  • 9,293
  • 12
  • 65
  • 148

1 Answers1

2

Use DataFrame.stack with DataFrame.rename_axis and DataFrame.reset_index:

df = df.stack(0).rename_axis(('city','district','year')).reset_index()
print (df)
  city district  year  price  ratio
0   bj       cy  2018     12   0.01
1   bj       cy  2019      6   0.02
2   sh       hp  2018      4   0.02
3   sh       hp  2019      3   0.05
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you. Did I read excel correctly? Seems `city` is missing. – ah bon Apr 23 '20 at 07:25
  • You could check output of `pd.read_excel('data.xlsx', header = [0, 1], index_col = [0, 1], sheet_name = 'Sheet1')` from question, no column name `city`. – ah bon Apr 23 '20 at 07:27
  • 1
    @ahbon - I understand, it is only name of column like metadata. So if not in input file, cannot be parsed. So in my solution used `rename_axis` for set it. – jezrael Apr 23 '20 at 07:29
  • Great, it works. Do you mind have a look this question? https://stackoverflow.com/questions/61314450/merge-based-on-multiple-columns-of-all-excel-files-from-a-directory-in-python – ah bon Apr 23 '20 at 07:38
  • 1
    @ahbon - How working [this](https://stackoverflow.com/a/30512931/2901002) ? – jezrael Apr 23 '20 at 07:40
  • 1
    Thank you, I'll test with it. – ah bon Apr 23 '20 at 07:54