Finding average age in sheet using pandas

Question

How can I find the average age for each row in this example. Each columns has its own age ranging from 0 to 90. I have read the csv with pandas.

Just lookup 'average' in pandas documentation, it's a simple function. — purpin, Oct 07 '22 at 10:52
Does this answer your question? [pandas get column average/mean](https://stackoverflow.com/questions/31037298/pandas-get-column-average-mean) — ErnestBidouille, Oct 07 '22 at 10:54
I does not work, ideally. The problem is that it only takes the average of the sum of each row, but it does not take into account how many occourances of each age. For example If there is 4 people that are aged 4. I need to mulitply 4 with 4 — Mads, Oct 07 '22 at 10:55
Does this work? Create new column with averages of each row using `df['mean'] = df.mean(axis=1)` — Redox, Oct 07 '22 at 11:04
No it does not work, i need to first sum up how many people has a certain age and then multiply it by the acctual age — Mads, Oct 07 '22 at 11:06

Mohammad · Answer 1 · 2022-10-07T11:21:56.783

0

I Assume all the age-related columns are named like Age_0, Age_1, Age_2, ...

import pandas as pd
import numpy as np

df = pd.read_csv("./data.csv")

ages = np.array([int(c.split("_")[-1]) for c in df.filter(regex="^Age_\d+$").columns])
df.loc[:, "avg_age"] = (df.filter(regex="^Age_\d+$").values * ages).sum(axis=1) / df.filter(regex="^Age_\d+$").values.sum(axis=1)

edited Oct 07 '22 at 11:21

answered Oct 07 '22 at 11:10

Mohammad

39
3

thank you, but i get this error message RuntimeWarning: invalid value encountered in true_divide copy.loc[:, "avg_age"] = (copy.filter(regex="^Age_\d+$").values * ages).sum(axis=1) / copy.filter(regex="^Age_\d+$").values.sum(axis=1) – Mads Oct 07 '22 at 11:24
it's a warning, not an error. the reason is that you have (at least) a row in which the sum of the number of observations is zero. For that row, the resulting `avg_age` would be NaN which is alright. if you want to get problematic row, try: df.loc[df.filter(regex="^Age_\d+$").sum(axis="columns") == 0, :] – Mohammad Oct 07 '22 at 11:29
what should i replace it with? – Mads Oct 07 '22 at 11:32
you don't need to replace the observations with anything. In the average column, you will have some NaNs which is ok. – Mohammad Oct 07 '22 at 11:51

score 0 · Answer 2 · answered Oct 07 '22 at 11:13

Assuming the values are the amount of people per age, here is one solution.

(Potentially not the fastest for large amount of data)

highest_age = 90

def row_age(x):
    sum_age = 0
    sum_people = 0
    for i in range(0, highest_age + 1):
        people = x[f"age_{i}"]
        sum_people += people
        sum_age += people * i
    if sum_people > 0:
        return sum_age / sum_people


df["average_age"] = df.apply(row_age, axis=1)

Finding average age in sheet using pandas

2 Answers2