0

I have a pandas dataframe which looks like as follows:

df = 
    COLUMN_NAME  YEAR1  YEAR2   VALUE
0   Column1       2013   2014   0.042835
1   Column1       2014   2015   0.033600
2   Column2       2013   2014   0.004406
3   Column2       2014   2015   0.016900
...

Where for each COLUMN_NAME, YEAR1 and YEAR2, a VALUE is calculated. I want to group the dataframe such that is it unique on COLUMN_NAME, where the columns look like the following:

df_desired = 
    COLUMN_NAME  Value_from_2013_2014   Value_from_2014_2015 ...
0   Column1      0.042835                  0.033600
1   Column2      0.004406                  0.016900
...

I can achieve sort of what I want with the code below, but it creates a MultiIndex columns, how can I achieve this? Thanks for the help.

pd.pivot_table(df, 'VALUE', 'COLUMN_NAME', ['YEAR1', 'YEAR2'])

YEAR1         2013      2014
YEAR2         2014      2015
COLUMN_NAME     
Column1       0.042835  0.0336
Column2       0.004406  0.0169

Solijoli
  • 464
  • 2
  • 8
  • 19
  • Does this answer your question? [Pandas - How to flatten a hierarchical index in columns](https://stackoverflow.com/questions/14507794/pandas-how-to-flatten-a-hierarchical-index-in-columns) – Mayank Porwal Nov 16 '20 at 18:46

3 Answers3

1

You can flatten the multiindex columns using to_flat_index, then map to str and add your prefix:

s.columns = ["Value_from_"+"_".join(map(str, i)) for i in s.columns.to_flat_index()]

print (s)

             Value_from_2013_2014  Value_from_2014_2015
COLUMN_NAME                                            
Column1                  0.042835                0.0336
Column2                  0.004406                0.0169
Henry Yik
  • 22,275
  • 4
  • 18
  • 40
0
df_agg = pd.pivot_table(df, 'VALUE', 'COLUMN_NAME', ['YEAR1', 'YEAR2'])
df_agg.columns = ['Value_from_' + str(df_agg.columns[i][0]) + '_' + str(df_agg.columns[i][1]) for i in range(len(df_agg.columns))]
Solijoli
  • 464
  • 2
  • 8
  • 19
  • 1
    Please use the edit link on your question to add additional information. The Post Answer button should be used only for complete answers to the question. - [From Review](/review/low-quality-posts/27632709) – Rohan Singh Nov 16 '20 at 19:59
  • @RohanSingh: While this would certainly benefit from additional explanation, it's worth noting that answers needn't be complete—at least assuming there isn't yet an accepted answer ([source](https://stackoverflow.com/help/how-to-answer)). – Jeremy Caney Nov 16 '20 at 20:25
-1

If I understanding correctly, you are getting the desired values, but not the desired labels. If so, you can change the column names with df_desired.columns = ['Value_from_2013_2014', 'Value_from_2014_2015']

Nabil Daoud
  • 221
  • 1
  • 10