0

I have a dataframe that contains values by country (and by region in certain countries) and which looks like this:

enter image description here

For each country that is repeated, I would add the values by regions so that there is only one row per country and obtain the following file:

enter image description here

How can I do this in Python? Since I'm really new to Python, I don't mind having a long set of instructions, as long as the procedure is clear, rather than a single line of code, compacted but hard to understand.

Thanks for your help.

Andrew
  • 926
  • 2
  • 17
  • 24

1 Answers1

0

You want to study the split-apply-combine paradigm of Pandas DataFrame manipulation. You can do a lot with it. What you want to do is common, and can be accomplished in one line.

>>> import pandas as pd
>>> df = pd.DataFrame({"foo": ["a","b","a","b","c"], "bar": [6,5,4,3,2]})
>>> df 
  foo  bar
0   a    6
1   b    5
2   a    4
3   b    3
4   c    2
>>> df.groupby("foo").sum() 
     bar
foo     
a     10
b      8
c      2
John Ladasky
  • 1,016
  • 8
  • 17