1

using the example from Pandas sum by groupby, but exclude certain columns

Code   Country      Item_Code   Item    Ele_Code    Unit    Y1961    Y1962   Y1963
2      Afghanistan  15          Wheat   5312        Ha      10       20      30
2      Afghanistan  25          Maize   5312        Ha      10       20      30
4      Angola       15          Wheat   7312        Ha      30       40      50
4      Angola       25          Maize   7312        Ha      30       40      50

when we do the

df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum()

the output will be

                       Y1961  Y1962  Y1963
Country     Item_Code
Afghanistan 15            10     20     30
            25            10     20     30
Angola      15            30     40     50
            25            30     40     50

Now , here is my questions

when I do df.columns i will only get Y1961 Y1962 Y1963

But what if I want Country, Item_Code are included as columns like below

df.columns

Country, Item_Code ,Y1961  Y1962  Y1963
manda1238
  • 73
  • 1
  • 2
  • 9

2 Answers2

2

you need to specify the arg as_index=False

df.groupby(['Country', 'Item_Code'],as_index=False)[["Y1961", "Y1962", "Y1963"]].sum()


       Country  Item_Code  Y1961  Y1962  Y1963
0  Afghanistan         15     10     20     30
1  Afghanistan         25     10     20     30
2       Angola         15     30     40     50
3       Angola         25     30     40     50


df.columns

Index(['Code', 'Country', 'Item_Code', 'Item', 'Ele_Code', 'Unit', 'Y1961',
       'Y1962', 'Y1963'],
      dtype='object')

you could also do

df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum().reset_index()
Umar.H
  • 22,559
  • 7
  • 39
  • 74
1

Those are now index names. You can get that using df.index.names.

list(df.index.names) + df.columns.to_list()
['Country', 'Item_Code', 'Y1961', 'Y1962', 'Y1963']
Dishin H Goyani
  • 7,195
  • 3
  • 26
  • 37