0

My DataFrame is

State|City|Year|Budget|Income
S1|C1|2000|1000|1
S1|C2|2000|1200|2
S2|C3|2000|5500|3

I need to get a new DataFrame with columns:

State, Year, Count, Sum_Budget, Sum_Income:

That is,

State|Year|Count|Sum_Budget|Sum_Income
S1|2000|2|2200|3
S2|2000|1|5500|3

In C# the code would be:

    dataframe
       .GroupBy(x => new { x.State, x.City})
       .Select(x => new {
          x.Key.State,
          x.Key.City,
          Count = x.Count(),
          Sum_Budget = x.Sum(y => y.Budget),
          Sum_Income= x.Sum(y => y.Income)
       }
}).ToArray();

How do I do so with Pandas?

miradulo
  • 28,857
  • 6
  • 80
  • 93
StuffHappens
  • 6,457
  • 13
  • 70
  • 95

1 Answers1

4

Use agg:

d = {'Income':'Sum_Income','Budget':'Sum_Budget','City':'Count'}
agg_d = {'Budget':'sum', 'Income':'sum', 'City':'size'}
df = df.groupby(['State', 'Year'], as_index=False).agg(agg_d).rename(columns=d)

print (df)
   State  Year  Sum_Income  Sum_Budget  Count
0    S1  2000           3        2200      2
1    S2  2000           3        5500      1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252