1

I have data as follows:

eg_data <- data.frame(
id = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),
date = c("11/1", "11/1", "11/2", "11/1", "11/5", "11/5", "11/4", "11/5", "11/4", "11/2", "11/4", "11/3", "11/3", "11/2", "11/3", "11/2", "11/1", "11/1", "11/2", "11/3"),
sales = c(2,3,2,3,4,5,4,5,6,2,3,4,7,6,5,4,6,4,3,5),
dupes = c(F,T,F,T,F,F,F,T,T,F,F,F,T,F,T,F,F,T,T,F),
dupes2 = c(F,F,F,T,F,F,F,T,F,F,F,F,F,F,F,F,F,F,F,F))

dupes are duplicates by date, dupes2 are duplicates by date + sales

I need to flag any instances where dupes = TRUE and dupes2 = FALSE. I need this done at the ID level, IE this condition exists once for id=1, every row where id=1 will be flagged as a result.

I have tried something like:

eg_data <- eg_data %>% group_by(id, dupes=TRUE, dupes2=FALSE) %>% mutate(flag=1)

This obviously doesn't work, but that's the idea. For all IDs that have any row where dupe = T and dupe2 = F, flag all iterations of that id with 1.

The end result would be the data above with a column called flag that = 1, b/c for every id 1-4, there is at least one row where dupes = T and dupes2 = F. I need to add a column to the dataset, not filter it to a list that prints, not create a separate dataset.

I have looked at

dplyr group_by logical values

and

Grouping functions (tapply, by, aggregate) and the *apply family

but neither did it for me.

Any help is appreciated.

Adam_S
  • 687
  • 2
  • 12
  • 24

1 Answers1

1

As per Op write into an answer using any

eg_data = eg_data %>% group_by(id) %>% mutate(flag=any(dupes&!dupes2))
BENY
  • 317,841
  • 20
  • 164
  • 234