1

This is my dataframe x

ID       Name      Initials     AGE 
123      Mike        NA          18
124      John        NA          20
125      Lily        NA          21
126      Jasper      NA          24
127      Toby        NA          27 
128      Will        NA          19 
129      Oscar       NA          32

I also have a list of ID's I want to remove from data frame x, num[1:3], which is the following: y

>print(y)
[1] 124 125 129

My goal is remove all the ID's in y from data frame x

This is my desired output

ID       Name      Initials     AGE 
123      Mike        NA          18
126      Jasper      NA          24
127      Toby        NA          27 
128      Will        NA          19 

I'm using the dplyr package and trying this but its not working,

FinalData <- x %>% 
             select(everything()) %>%
             filter(ID != c(y))

Can anyone tell me what needs to be corrected?

RL_Pug
  • 697
  • 7
  • 30
  • 1
    Right now you're checking that ID is *not equal* to y, rather than that ID is *not contained within* y. Also see [this](https://stackoverflow.com/q/9350025/5325862), [this](https://stackoverflow.com/q/7494848/5325862), [this](https://stackoverflow.com/q/51107901/5325862), [this](https://stackoverflow.com/q/27067637/5325862), and [this](https://stackoverflow.com/q/11612235/5325862) – camille Feb 07 '20 at 20:29
  • 1
    And [this](https://stackoverflow.com/q/5831794/5325862) one. Those should have you covered – camille Feb 07 '20 at 20:36

1 Answers1

2

We can use %in% and negate ! when the length of the 'y' is greater than 1. The select step is not needed as it is selecting all the columns with everything()

library(dplyr)
x %>%
    filter(!ID %in% y)
#   ID   Name Initials AGE
#1 123   Mike       NA  18
#2 126 Jasper       NA  24
#3 127   Toby       NA  27
#4 128   Will       NA  19

Or another option is anti_join

x %>% 
    anti_join(tibble(ID = y))

In base R, subset can be used

subset(x, !ID %in% y)

data

y <- c(124, 125, 129)
x <- structure(list(ID = 123:129, Name = c("Mike", "John", "Lily", 
"Jasper", "Toby", "Will", "Oscar"), Initials = c(NA, NA, NA, 
NA, NA, NA, NA), AGE = c(18L, 20L, 21L, 24L, 27L, 19L, 32L)),
 class = "data.frame", row.names = c(NA, 
-7L))
akrun
  • 874,273
  • 37
  • 540
  • 662