0

I am looking to filter out observations within my data based on certain values by group, which is based on a separate table. I am also trying to work exclusively with dplyr whereas I've performed tasks like these with data.table and I'm not sure how to accomplish it at all.

Here is some sample data to illustrate:

#Primary dataset
dat <- data.frame(account = c(1, 3, 3, 3, 5, 5, 7),
              ip = c("255.255.255", 
                     "255.255.255", "199.199.99", "255.255.255",
                     "75.75.75", "120.120.120",
                     "50.50.50"),
              value = c(50, 1000, 800, 2500, 3000, 500, 75))

From the dataset, I would like to filter based on a list of IPs per account, which is another table:

#Filtering reference table
exclude <- data.frame(account = c(3, 5),
                  ip = c("255.255.255", "120.120.120"))

The desired output of dat after filtering would be:

   account          ip value
 1       1 255.255.255    50
 2       3  199.199.99   800
 3       5    75.75.75  3000
 4       7    50.50.50    75

I am specifically unsure how to include the reference in a group_by within a piped (%>%) series of dplyr verbs on dat. I also may be approaching the task incorrectly given I am still familiarizing with the dplyr style of programming, so am open to a different way than the reference approach I am considering as long as it is within dplyr.

daRknight
  • 253
  • 3
  • 17
  • I would look into dplyr's `_join` family functions – Mike Feb 05 '19 at 19:10
  • Indeed, the `anti_join` does precisely this with the command `dat %>% anti_join(exclude)` or explicitly with `dat %>% anti_join(exclude, by = c("account", "ip"))` – daRknight Feb 05 '19 at 19:39

1 Answers1

0

How about:

dat %>% mutate(accountip = paste0(account, ip)) %>% filter(!(accountip %in% paste0(exclude$account,exclude$ip))) %>% select(account, ip, value)

rhozzy
  • 332
  • 1
  • 9
  • 1
    This seems quite verbose compared to `anti_join(dat, exclude)` which is one option from the link Camille posted. – markus Feb 05 '19 at 19:32
  • Fully agree. This could be a useful approach in the case where the dataframes your filtering on don't have the exact same structure/columns. – rhozzy Feb 05 '19 at 19:34