2

I have a database like this:

Month power art unit energy_1 (2018) energy_2 (2018) energy_3 (2018) energy_4 (2018) energy_1 (2019) energy_2 (2019) energy_3 (2019) energy_4 (2019)
Jan Number of power feeding systems Number 100 100 100 100 102 102 102 102
Jan net rated power MW 100 100 100 100 102 102 102 102
Jan power feed KwH 100 100 100 100 102 102 102 102
Feb Number of power feeding systems Number 100 100 100 100 102 102 102 102
Feb net rated power MW 100 100 100 100 102 102 102 102
Feb power feed KwH 100 100 100 100 102 102 102 102

But I would get this:

Month power art unit Energy art energy_1 Value Year
Jan Number of power feeding systems Number energy_1 100 2018
Jan Number of power feeding systems Number energy_2 100 2018
Jan Number of power feeding systems Number energy_3 100 2018
Jan Number of power feeding systems Number energy_4 100 2018
Jan Number of power feeding systems Number energy_1 100 2019
Jan Number of power feeding systems Number energy_2 100 2019
Jan Number of power feeding systems Number energy_3 100 2019
Jan Number of power feeding systems Number energy_4 100 2019
Feb Number of power feeding systems Number energy_1 100 2018
Feb Number of power feeding systems Number energy_2 100 2018
Feb Number of power feeding systems Number energy_3 100 2018
Feb Number of power feeding systems Number energy_4 100 2018
Feb Number of power feeding systems Number energy_1 100 2019
Feb Number of power feeding systems Number energy_2 100 2019
Feb Number of power feeding systems Number energy_3 100 2019
Feb Number of power feeding systems Number energy_4 100 2019
  • I can add year column but what can I do with repeating columns in certain range? I try pd.melt but my dataset is repeating.

#pd.melt(df, id_vars=[('Month', 'power art ', 'unit', 'Energy art')], value_vars=[('energy_1', 'energy_2')])

erdem
  • 35
  • 5

1 Answers1

2

One way:

df1 = df.set_index(['Month', 'power art', 'unit'])
df1.columns = pd.MultiIndex.from_frame(df1.columns.str.extract('(\S+) \((\d+)\)'), 
              names = ['Energy', 'Year'])
df_out = df1.melt(ignore_index = False).reset_index()
df_out

  Month                        power art    unit    Energy  Year  value
0   Jan  Number of power feeding systems  Number  energy_1  2018    100
1   Jan                  net rated power      MW  energy_1  2018    100
2   Jan                       power feed     KwH  energy_1  2018    100
3   Feb  Number of power feeding systems  Number  energy_1  2018    100
4   Feb                  net rated power      MW  energy_1  2018    100

Or you could do:

df2 = pd.wide_to_long(df, 'energy', df.columns[:3], 'group', '_',
               '\\d \(\\d+\)').reset_index()
df2[['group', 'Year']] = df2.group.str.extract('(\S+) \((\d+)\)')
df2.head()

  Month                        power art    unit group  energy  Year
0   Jan  Number of power feeding systems  Number     1     100  2018
1   Jan  Number of power feeding systems  Number     2     100  2018
2   Jan  Number of power feeding systems  Number     3     100  2018
3   Jan  Number of power feeding systems  Number     4     100  2018
4   Jan  Number of power feeding systems  Number     1     102  2019
Onyambu
  • 67,392
  • 3
  • 24
  • 53