-1

I have a dataset as below with five columns:Date, WeekNum, Year,Total Hours, Total Production

I want to sum my Total Hours by weeknum and year, also sum total production by weeknum and year, then use the first sum divided by second sum.

I know I can use aggregate function, for example:

sum1<-aggregate(list(TotalHours=input$TotalHours), by = list(Year=input$Year, WeekNum=input$WeekNum), FUN=sum)

sum2<-aggregate(list(TotalProduction=input$TotalProduction), by = list(Year=input$Year, WeekNum=input$WeekNum), FUN=sum)

then use sum1/sum2

However if there any easy way to simply this code or there is another function can help me to do that.

Thanks

Rowling
  • 213
  • 1
  • 8
  • 20
  • Can you provide a [minimum reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? Start with pasting the result of `dput(input)`. – hpesoj626 Mar 19 '18 at 14:40

1 Answers1

0

You can use dplyr, pass your weeknum and year variables into group_by, after that pass your TotalProduction and TotalHours to the summarise, just like I did with mpg and wt variables, and finally use mutate to compute new variable finalsum on top of variables which you computed in the summarise step. Here is an example code based on mtcars:

library(dplyr)
mtcars %>% 
  group_by(cyl, carb) %>%
  summarise(sum_mpg=sum(mpg), sum_wt=sum(wt)) %>%
  mutate(finalsum = sum_mpg/sum_wt)
  • dplyr approach has more elegant syntax (one does not have to use data$variable, just use the variable name), and it is faster, especially when it comes to larger data. I have simulated this two approaches on 8 million records data, and dplyr code had the job done in half a second, while aggregate approach was running 52 seconds in order to throw results. So when working with bigger datasets I would highly recommend switching to dplyr. – Paweł Kozielski-Romaneczko Mar 19 '18 at 15:56
  • Thank you so much! – Rowling Mar 19 '18 at 16:02