Pandas Dataframe groupby statement output to 2 columns

Question

I have a dictionary of values:

{'Spanish Omlette': -0.20000000000000284,
 'Crumbed Chicken Salad': -1.2999999999999972,
 'Chocolate Bomb': 0.0,
 'Seed Nut Muesli': -3.8999999999999915,
 'Fruit': -1.2999999999999972,
 'Frikerdels Salad': -1.2000000000000028,
 'Seed Nut Cheese Biscuits': 0.4000000000000057,
 'Chorizo Pasta': -2.0,
 'No carbs Ice Cream': 0.4000000000000057,
 'Veg Stew': 0.4000000000000057,
 'Bulgar spinach Salad': 0.10000000000000853,
 'Mango Cheese': 0.10000000000000853,
 'Crumbed Calamari chips': 0.10000000000000853,
 'Slaw Salad': 0.20000000000000284,
 'Mango': -1.2000000000000028,
 'Rice & Fish': 0.20000000000000284,
 'Almonds Cheese': -0.09999999999999432,
 'Nectarine': -1.7000000000000028,
 'Banana Cheese': 0.7000000000000028,
 'Mediteranean Salad': 0.7000000000000028,
 'Almonds': -4.099999999999994}

I am trying to get the aggregated sum of the values of each food item from the dictionary using Pandas:

fooddata = pd.DataFrame(list(foodWeight.items()), columns=['food','weight']).groupby('food')['weight'].agg(['sum']).sort_values(by='sum', ascending=0)

The above code gives the the correct output:

                           sum
food                          
Banana Cheese              0.7
Mediteranean Salad         0.7
Seed Nut Cheese Biscuits   0.4
Veg Stew                   0.4
No carbs Ice Cream         0.4
Slaw Salad                 0.2
Rice & Fish                0.2
Almonds Mango              0.1
Bulgar spinach Salad       0.1
Crumbed Calamari chips     0.1
Frikkadels Salad           0.1
Mango Cheese               0.1
Chocolate Bomb             0.0
Burrito Salad              0.0
Fried Eggs Cheese Avocado  0.0
Burger and Chips          -0.1
Traditional Breakfast     -0.1
Almonds Cheese            -0.1

However, I need to get the output in 2 columns not one which Pandas is giving me above.

How do I get the output into a format that I can plot the data. I.E Label and Value as separate values

You can reset the index: `fooddata.reset_index()` or if more performant: `fooddata.reset_index (inplace=True)` — Alex, Mar 21 '17 at 09:46
@Mayeulsgc Yes, I want to plot the Food Label and the Weight Sum data — Grant, Mar 21 '17 at 09:52
I don't think you need your second column to plot the label and the value, just set one of the axis on the index — Mayeul sgc, Mar 21 '17 at 09:54

jezrael · Answer 1 · 2017-03-21T10:12:41.277

You can use parameter as_index=False in groupby and aggregate sum:

fooddata = pd.DataFrame(list(foodWeight.items()), columns=['food','weight'])

print (fooddata.groupby('food', as_index=False)['weight']
               .sum()
               .sort_values(by='weight', ascending=0))
                        food  weight
2              Banana Cheese     0.7
12        Mediteranean Salad     0.7
20                  Veg Stew     0.4
14        No carbs Ice Cream     0.4
16  Seed Nut Cheese Biscuits     0.4
18                Slaw Salad     0.2
15               Rice & Fish     0.2
3       Bulgar spinach Salad     0.1
6     Crumbed Calamari chips     0.1
11              Mango Cheese     0.1
4             Chocolate Bomb     0.0
1             Almonds Cheese    -0.1
19           Spanish Omlette    -0.2
10                     Mango    -1.2
8           Frikerdels Salad    -1.2
9                      Fruit    -1.3
7      Crumbed Chicken Salad    -1.3
13                 Nectarine    -1.7
5              Chorizo Pasta    -2.0
17           Seed Nut Muesli    -3.9
0                    Almonds    -4.1

Another solution is add reset_index:

print (fooddata.groupby('food')['weight']
               .sum()
               .sort_values(ascending=0)
               .reset_index(name='sum'))
                        food  sum
0              Banana Cheese  0.7
1         Mediteranean Salad  0.7
2                   Veg Stew  0.4
3   Seed Nut Cheese Biscuits  0.4
4         No carbs Ice Cream  0.4
5                 Slaw Salad  0.2
6                Rice & Fish  0.2
7     Crumbed Calamari chips  0.1
8               Mango Cheese  0.1
9       Bulgar spinach Salad  0.1
10            Chocolate Bomb  0.0
11            Almonds Cheese -0.1
12           Spanish Omlette -0.2
13                     Mango -1.2
14          Frikerdels Salad -1.2
15     Crumbed Chicken Salad -1.3
16                     Fruit -1.3
17                 Nectarine -1.7
18             Chorizo Pasta -2.0
19           Seed Nut Muesli -3.9
20                   Almonds -4.1

For plotting is better not reset index - then values of index create axis x - use plot:

fooddata.groupby('food')['weight'].sum().sort_values(ascending=0).plot()

Or if need plot barh:

fooddata.groupby('food')['weight'].sum().sort_values(ascending=0).plot.barh()

score 2 · Answer 2 · answered Mar 21 '17 at 09:54

2

Set as_index=False while calling group by

fooddata = pd.DataFrame(list(foodWeight.items()), columns=['food','weight']).groupby('food',as_index=False).agg({"weight":"sum"}).sort_values(by='weight', ascending=0)

answered Mar 21 '17 at 09:54

Himaprasoon

2,609
3
25
46

1

If you don't set as_index as false, pandas will set the column which is used to group as the index for the dataframe by default. In your case food was set as the index column – Himaprasoon Mar 21 '17 at 10:13

score 1 · Accepted Answer · edited May 23 '17 at 12:25

After the grouping you need to reset the index or use as_index=False when calling groupby. Paraphrasing this post, by default aggregation functions will not return the groups that you are aggregating over if they are named columns. Instead the grouped columns will be the indices of the returned object. Passing as_index=False or calling reset_index afterwards, will return the groups that you are aggregating over, if they are named columns.

See below my attempt to turn your results in a meaningful graph:

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
df = fooddata.reset_index()
ax = df[['food','sum']].plot(kind='barh', title ="Total Sum per Food Item", figsize=(15, 10), legend=True, fontsize=12)
ax.set_xlabel("Sum per Food Item", fontsize=12)
ax.set_ylabel("Food Items", fontsize=12)
ax.set_yticklabels(df['food'])
plt.show()

This results in

All useful answers, thanks. I'm going to accept this as the answer as @Jaco was the first to provide a valuable insight — Grant, Mar 21 '17 at 10:20

Pandas Dataframe groupby statement output to 2 columns

3 Answers3