2

I have the following Pandas dataframe:

Name   | EventSignupNo | Attended | Points
Smith  | 0145          | Y        | 20.24
Smith  | 0174          | Y        | 29.14
Smith  | 0239          | N        | 0
Adams  | 0145          | N        | 0
Adams  | 0174          | Y        | 33.43
Morgan | 0239          | Y        | 31.23
Morgan | 0244          | Y        | 23.15

and what I'd like is a count of the number of events attended and not attended per person, and the sum of their points, per person. So I do a groupby: df.groupby([Name, Attended]).agg({"Attended": "count", "Points": "sum"}).rename(columns = {"Attended: "Count"}).reset_index()

which would give me something like:

Name   | Attended | Count | Points
Smith  | Y        | 2     | 49.38
Smith  | N        | 1     | 0
Adams  | Y        | 1     | 33.43
Adams  | N        | 1     | 0
Morgan | Y        | 2     | 54.38

but I'd want something like:

Name   | Attended | Count | Points
Smith  | Y        | 2     | 49.38
Smith  | N        | 1     | 0
Adams  | Y        | 1     | 33.43
Adams  | N        | 1     | 0
Morgan | Y        | 2     | 54.38
Morgan | N        | 0     | 0

I tried playing around with pd.MultiIndex to try to fill the missing zero count, but to no avail. I've read the other similar questions but I'm having trouble dealing with the continuous Points column using MultiIndex. Any idea how to do this?

L. Chu
  • 123
  • 3
  • 14

1 Answers1

3

You could do this with groupby + agg. For your exact output with Y and N at each level, you'd need reindex:

g = df.groupby(['Name', 'Attended'], sort=False).Points.agg(['count', 'sum'])

g
                 count    sum
Name   Attended              
Smith  Y             2  49.38
       N             1   0.00
Adams  N             1   0.00
       Y             1  33.43
Morgan Y             2  54.38
idx = pd.MultiIndex.from_product([g.index.levels[0], ['Y', 'N']])

idx
MultiIndex(levels=[['Adams', 'Morgan', 'Smith'], ['N', 'Y']],
           labels=[[2, 2, 0, 0, 1, 1], [1, 0, 1, 0, 1, 0]])


g.reindex(idx, fill_value=0)

          count    sum
Smith  Y      2  49.38
       N      1   0.00
Adams  Y      1  33.43
       N      1   0.00
Morgan Y      2  54.38
       N      0   0.00
cs95
  • 379,657
  • 97
  • 704
  • 746
  • @COLDSPEED follow up question: The above works and I noticed you changed the aggregation to " ... .Points.agg()" . Idiomatically to me it reads as aggregation on points. Since it sums the points, is it fair to say it "counts" the points by occurence w/ respect to the group ['Name', 'Attended']? The count part seems weird to me. – L. Chu Nov 19 '17 at 01:45
  • @L.Chu you are correct in your understanding. It counts the size of each group. – cs95 Nov 19 '17 at 02:25