2

I have a dataset, I'm including (a small subset) of the relevant columns below,

year ID type result  
2003 1   new        closed  
2003 2   new        transferred  
2003 3   subsequent closed  
2003 4   subsequent diverted  
....  
2015 1000 new       closed

What I want to calculate is the fraction of subsequents, (no. of subsequents/(no.subsequents +no. of news) grouped by year and result, like so:

year result subsequent_frac  
2003 closed 0.10  
2003 transferred 0.05  
2003 ....  
....  
2015 closed 0.05  
2015 transferred 0.1  

I know I can do in in steps, with a group_by and summaries to get the counts and and do each result separately.... I was wondering if there was a neater/faster way to do this.

Frank
  • 66,179
  • 8
  • 96
  • 180
UIyer
  • 21
  • 2
  • You can group_by(year, result) -- multiple columns. Not sure if that's your issue. – Frank Sep 01 '16 at 18:50
  • @Frank, I apologize if I wasn't very clear, I know that I can group by year and result and get the counts, but I also want to operate on those counts for every year and result. For example, if for year 2003, there are 44,711 counts of "new" for result "closed" and 3856 counts of "subsequent" for the same result I want to calculate a subsequent fraction =3856/(3856+44711). – UIyer Sep 01 '16 at 19:00
  • Ok. I think you might have better luck here by posting a more concrete example, with code that reproduces it. http://stackoverflow.com/a/28481250/ – Frank Sep 01 '16 at 19:05

1 Answers1

1

Is this what you are looking for? Applying summarise removes one level of grouping, therefore the second group_by.

dfSummarized <- group_by(df, year, type) %>% 
            summarise(subsequent_frac = n()) %>% 
            #group_by(type) %>% # maybe you don't need this?
            mutate(freq = subsequent_frac / sum(subsequent_frac))
Valter Beaković
  • 3,140
  • 2
  • 20
  • 30
  • Thanks @Valter. Just needed a little change, `dfSummarized <- group_by(df, year, type ,result) %>% summarise(subsequent_frac = n()) %>% group_by(year,result) %>% mutate(freq = subsequent_frac / sum(subsequent_frac))` – UIyer Sep 01 '16 at 21:16
  • @UIyer Glad it helped! – Valter Beaković Sep 02 '16 at 05:15