How to aggregate rows in a dataframe

Question

I have a dataframe that contains values by country (and by region in certain countries) and which looks like this:

For each country that is repeated, I would add the values by regions so that there is only one row per country and obtain the following file:

How can I do this in Python? Since I'm really new to Python, I don't mind having a long set of instructions, as long as the procedure is clear, rather than a single line of code, compacted but hard to understand.

Thanks for your help.

score 0 · Answer 1 · answered Apr 18 '20 at 09:10

0

You want to study the split-apply-combine paradigm of Pandas DataFrame manipulation. You can do a lot with it. What you want to do is common, and can be accomplished in one line.

>>> import pandas as pd
>>> df = pd.DataFrame({"foo": ["a","b","a","b","c"], "bar": [6,5,4,3,2]})
>>> df 
  foo  bar
0   a    6
1   b    5
2   a    4
3   b    3
4   c    2
>>> df.groupby("foo").sum() 
     bar
foo     
a     10
b      8
c      2

answered Apr 18 '20 at 09:10

John Ladasky

1,016
8
17

Thank you. Using with `reset_index()`, it gives perfectly what I wanted. – Andrew Apr 18 '20 at 10:41
If I gave you a correct answer, Stack Overflow (and I) would appreciate it if you would mark my answer as correct. Thanks! – John Ladasky Apr 19 '20 at 22:04

How to aggregate rows in a dataframe

1 Answers1