1

I would like to sum multiple values to one in python. See the picture below of my data. I want to sum all the values of AGE for each year for each country.

Instead of having this:

country  TIME       AGE      Value
A        2017       20-60     200
A        2017       60-80     100
A        2016       20-60     200
A        2016       60-80     200
B        2017       20-60     300
B        2017       60-80     300
B        2016       20-60     400
B        2016       60-80     400

I would like to have this:

country  TIME             Value
A       2017               300       
A       2016               400
B       2017               600       
B       2016               800

The types of data:

df4types
AGE      object
Value    object
dtype: object

The data has a multi index by country and TIME.

If have tried this:

df=df.groupby(by=["TIME","GEO"])['Value'].sum()

and this:

df=df.groupby(by=["TIME","GEO"]).sum()['Value']

Both "worked" but result in an enormous value. Like it doesn't sum but paste the numbers behind each other. I have tried to change the variable type to numeric by using: by df.Value.astype(float) & df.Value.astype(int)

Unfortunately this didn't solve the problem. Does someone have an idea how to sum the values by group and time correctly? I have also uploaded a picture of the real dataset.

enter image description here

petezurich
  • 9,280
  • 9
  • 43
  • 57
Pat
  • 37
  • 1
  • 5

1 Answers1

0
  • The age column doesn't seem to play a role in the data you want.
  • The "Value" shouldn't be a dtype=object. If you try df.Value = df.Value.astype(int) or df.Value=pd.to_numeric(df.Value) and it doesn't work then I'm betting there is some data you will need to clean up in that column)
  • You shouldn't need to mess with the multi index

After you do the above then try this code.

import pandas as pd
df = pd.DataFrame(<your data here>)
result = df.groupby(by=['country','TIME']).sum() 
Back2Basics
  • 7,406
  • 2
  • 32
  • 45