Pandas DataFrame groupby aggregate set values on second dataframe

Question

Consider two DataFrame, A and B, where

A has columns name, ...,
B has columns name, aname, ...

One entry in A has any number (0, 1, or more than 1) of corresponding entries in B. These are those where b.aname = a.name for some a in A and b in B.

A could be houses and B people in each house. A house could be empty or some people living there.

I want to do some calculations on table B grouping them using groupby/aggregate and then setting back the results to table A.

Example: Each person has a body height. Calculate the average body height of inhabitants of houses.

B_grouped_agg = B.groupby('aname')['some_property'].agg(['count', 'sum', 'mean']) and then something like A['new_aggregated_property'] = B_grouped_agg['mean']

Two problems arise:

It could happen that the groupby/aggregate list is smaller than A when for some a in A there is no b in B so that b.aname = a . How to default for missing items, that should pass an empty set to agg.
How to correctly sort the list B_grouped_agg ? Does it match the order of A (in case the lengths match, so when case 1 does not occur)

You have to `merge` the aggregated B with A – mozway Oct 25 '22 at 12:30 — mozway, Oct 25 '22 at 12:30

Pandas DataFrame groupby aggregate set values on second dataframe

0 Answers0