I've been leveraging dplyr in my workflows for quite some time. I'm coming to the realization that perhaps I don't understand the group_by function. Can someone please explain if there is a better approach to accomplishing my goals.
My initial understanding was that by introducing group_by() before operations such as mutate, the mutate function would perform its function dicretely across groups specified by group_by(), restarting it's operation on each Condition specified by group_by()
This doesn't seem to be true and I've had to resort to splitting my data tables into lists by the Condition that I had previously entered into group_by(), performing my intended functions, and then collapsing the list back into a matrix; by the use of lapply.
Example below. My intention was to perform a cumsum operation on column TVC for each Condition. However, you'll see that the summation column is a straightforward cumsum operation across the TVC column without discretization between groups specified by the Condition column.
> (data %>% filter(`Elapsed Time (days)`<=8) %>%
+ arrange(Condition,`Elapsed Time (days)`) %>%
+ select(Condition, `Elapsed Time (days)`, TVC) %>%
+ filter(!is.na(TVC)) %>%
+ group_by(Condition) %>%
+ mutate(summation =cumsum(TVC)))
# A tibble: 94 x 4
# Groups: Condition [24]
Condition `Elapsed Time (days)` TVC summation
<chr> <drtn> <dbl> <dbl>
1 1A 0.000000 secs 15400921. 15400921.
2 1A 4.948611 secs 11877256. 27278177
3 1A 6.027778 secs 11669731. 38947908.
4 1A 6.949306 secs 11908853. 50856761.
5 1B 0.000000 secs 14514263. 65371024.
6 1B 4.948611 secs 8829356. 74200380.
7 1B 6.027778 secs 12068221. 86268601.
8 1B 6.949306 secs 10111424. 96380026.
9 1C 0.000000 secs 15400921. 111780946.
10 1C 4.948611 secs 8680060 120461006.