For instance, let's say I have a dataframe named df
with a column "ID"
of integers and I want to grab the subset of my dataframe in which the value in "ID"
is in the vector [123,198,204,245,87,91,921].
What would the syntax for this be in R?
I believe you want the %in%
function:
df <- data.frame(ID=1:1000, STUFF=runif(1000))
df2 <- df[df$ID %in% c(123,198,204,245,87,91,921), ]
Plese let me know if it solves your problem.
First, we'll need the which function.
?which
Which indices are TRUE?
Description
Give the TRUE indices of a logical object, allowing for array indices.
i <- 1:10
which(i < 5)
1 2 3 4
We'll also need the %in% operator:
?"%in%"
%in% is a more intuitive interface as a binary operator, which returns a logical vector indicating if there is a match or not for its left operand.
2 %in% 1:5
TRUE
2 %in% 5:10
FALSE
PUTTING AL TOGETHER
# some starting ids
id <- c(123, 204, 11, 12, 13, 15, 87, 123)
# the df constructed with the ids
df <- data.frame(id)
# the valid ids
valid.ids <- c(123,198,204,245,87,91,921)
# positions is a logical vector which represent for each element if it's a match or not
positions <- df$id %in% valid.ids
positions
[1] TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE
# BONUS
# we can easily count how many matches we have:
sum(positions)
[1] 4
# using the which function we get only the indices 'which' contain TRUE
matched_elements_positions <- which(positions)
matched_elements_positions
1 2 7 8
# last step, we select only the matching rows from our dataframe
df[matched_elements_positions,]
123 204 87 123