I have a dataframe
as the one below. I want to combine the duplicates in the 'activity'
column except from the ones called 'selection'
, and sum their value in the 'duration'
column. I am doing it in R. I've tried using aggregate()
but I couldn't find the way to not aggregate the 'selection'
rows.
# df - I used dput so you can have my df
test <- structure(list(activity = c("selection", "selection", "selection",
"other", "inspection", "assignment", "inspection", "inspection",
"inspection", "inspection"), workers = c("worker 1", "worker 1",
"worker 1", "worker 34", "worker 6", "worker 5", "worker 2",
"worker 2", "worker 2", "worker 2"), start_time = structure(c(1645396200,
1645396200, 1645396200, 1645394352, 1645394155, 1645394100, 1645390080,
1645476480, 1645562880, 1645649280), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), status = c("passed", "passed", "passed", "passed",
"passed", "passed", "passed", "passed", "passed", "passed"),
duration = c(8.98333333333333, 9.69027777777778, 9.20555555555556,
0.557222222222222, 2.24527777777778, 1.61666666666667, 2.12166666666667,
1.32638888888889, 2.59861111111111, 0.765555555555556)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
test
# A tibble: 10 x 5
activity workers start_time status duration
<chr> <chr> <dttm> <chr> <dbl>
1 selection worker 1 2022-02-20 22:30:00 passed 8.98
2 selection worker 1 2022-02-20 22:30:00 passed 9.69
3 selection worker 1 2022-02-20 22:30:00 passed 9.21
4 other worker 34 2022-02-20 21:59:12 passed 0.557
5 inspection worker 6 2022-02-20 21:55:55 passed 2.25
6 assignment worker 5 2022-02-20 21:55:00 passed 1.62
7 inspection worker 2 2022-02-20 20:48:00 passed 2.12
8 inspection worker 2 2022-02-21 20:48:00 passed 1.33
9 inspection worker 2 2022-02-22 20:48:00 passed 2.60
10 inspection worker 2 2022-02-23 20:48:00 passed 0.766