1

I have a dataframe:

date        code     result  
2020-01-01  2069.0   Negative
2020-01-29  2069.0   Negative
2020-02-06  2069.0   Positive
2020-02-06  2070.0   Negative
2020-02-07  2070.0   Positive

Grouping by code, I want to find how many results = 'Positive', and how many results = 'Positive' AND 'Negative'. I'm quite new to pandas so I'm quite confused with all the functions that are available.

Thanks!

Jenny Char
  • 81
  • 2
  • 8

1 Answers1

0

You can try groupby.agg:

d = dict(zip(['sum','count'],['Positive','Both']))
(df['result'].eq('Positive').view('i1').groupby(df['code']).
agg(['sum','count']).rename(columns=d))

        Positive  Both
code                  
2069.0         1     3
2070.0         1     2
anky
  • 74,114
  • 11
  • 41
  • 70
  • Thanks! Would this also count NaN values in the Both column? – Jenny Char May 28 '20 at 16:02
  • @JennyChar no. for that use `size` instead of `count` [What is the difference between size and count in pandas?](https://stackoverflow.com/questions/33346591/what-is-the-difference-between-size-and-count-in-pandas) – anky May 28 '20 at 16:03
  • Oh no thats great, I was just wondering because I get a higher count for 'both' results than when I use `df.groupby(['code', 'result']).count()`, so now I'm not sure which method is accurate. Do you know why that might be the case? – Jenny Char May 28 '20 at 16:08
  • when you groupby `['code', 'result']` it returns unique roes considering both `['code', 'result']` whereas when you groupby `code` it aggregates based on only the code column, your question says you want to groupby column code – anky May 28 '20 at 16:10