I have data as follows:
eg_data <- data.frame(
id = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),
date = c("11/1", "11/1", "11/2", "11/1", "11/5", "11/5", "11/4", "11/5", "11/4", "11/2", "11/4", "11/3", "11/3", "11/2", "11/3", "11/2", "11/1", "11/1", "11/2", "11/3"),
sales = c(2,3,2,3,4,5,4,5,6,2,3,4,7,6,5,4,6,4,3,5),
dupes = c(F,T,F,T,F,F,F,T,T,F,F,F,T,F,T,F,F,T,T,F),
dupes2 = c(F,F,F,T,F,F,F,T,F,F,F,F,F,F,F,F,F,F,F,F))
dupes are duplicates by date, dupes2 are duplicates by date + sales
I need to flag any instances where dupes = TRUE and dupes2 = FALSE. I need this done at the ID level, IE this condition exists once for id=1, every row where id=1 will be flagged as a result.
I have tried something like:
eg_data <- eg_data %>% group_by(id, dupes=TRUE, dupes2=FALSE) %>% mutate(flag=1)
This obviously doesn't work, but that's the idea. For all IDs that have any row where dupe = T and dupe2 = F, flag all iterations of that id with 1.
The end result would be the data above with a column called flag that = 1, b/c for every id 1-4, there is at least one row where dupes = T and dupes2 = F. I need to add a column to the dataset, not filter it to a list that prints, not create a separate dataset.
I have looked at
and
Grouping functions (tapply, by, aggregate) and the *apply family
but neither did it for me.
Any help is appreciated.