0

Consider two DataFrame, A and B, where

  • A has columns name, ...,
  • B has columns name, aname, ...

One entry in A has any number (0, 1, or more than 1) of corresponding entries in B. These are those where b.aname = a.name for some a in A and b in B.

A could be houses and B people in each house. A house could be empty or some people living there.

I want to do some calculations on table B grouping them using groupby/aggregate and then setting back the results to table A.

Example: Each person has a body height. Calculate the average body height of inhabitants of houses.

B_grouped_agg = B.groupby('aname')['some_property'].agg(['count', 'sum', 'mean']) and then something like A['new_aggregated_property'] = B_grouped_agg['mean']

Two problems arise:

  1. It could happen that the groupby/aggregate list is smaller than A when for some a in A there is no b in B so that b.aname = a . How to default for missing items, that should pass an empty set to agg.
  2. How to correctly sort the list B_grouped_agg ? Does it match the order of A (in case the lengths match, so when case 1 does not occur)
pas-calc
  • 115
  • 9

0 Answers0