Here's a "dumb" version, based on this answer - just sum every 2 rows:
In[0]
df.groupby(df.index // 2).sum()
Out[0]:
VALUE
0 30
1 70
I say "dumb" because this method doesn't factor in the age cut offs, it just happens to align with them. So say if the age ranges are variable, or if you have data that start at 5-9 instead of 0-4, this will likely cause an issue. You also have to rename the index as it is unclear.
A "smarter" version would be to actually create bins with pd.cut
and use that to group the data, based on the ages for each row:
In[0]
df['MAX_AGE'] = df['AGE'].str.split('-').str[-1].astype(int)
bins = [0,10,20]
out = df.groupby(pd.cut(df['MAX_AGE'], bins=bins, right=False)).sum().drop('MAX_AGE',axis=1)
Out[0]:
VALUE
AGE
(0, 10] 30
(10, 20] 70
Explanation:
- Use
pandas.Series.str
methods to get out the maximum age for each row,
store in a column "MAX_AGE"
- Create
bins
at 10 year cut offs
- Use
pd.cut
to assign the data into bins
based on the max age of each row. Then use groupby
on these bins and sum. Note that since we specify right = False
, the bins depicted in the index should mean 0-9 and 10-19.
For reference, here is the data I was using:
import pandas as pd
VALUE = [10,20,30,40,]
AGE = ['0-4','5-9','10-14','15-19']
df = pd.DataFrame({'VALUE':VALUE,
'AGE':AGE})