I have one data frame with 332 names and another with 56000. All of the 332 names are included in the larger data frame. How do I remove rows of data from the large data frame if the names are included in the smaller data frame?
Asked
Active
Viewed 44 times
0
-
Welcome to SO! Can your post a minimal reproducible example? See: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – markus Apr 16 '20 at 22:03
-
We're going to need what data structure they're stored in (vector, data frame, data table, tibble etc.). You can find this out with the `class()` function. – Daniel V Apr 16 '20 at 22:07
2 Answers
0
Using the built in mtcars dataset in place of your large dataset, use the %in%
operator to subset to those in a reference data frame (your smaller one) and !
to make it "not in". Change dataframe names and variables to suit your need.
# SETUP
refDF <- data.frame("ID" = c(4,6))
# SOLUTION
mtcars[!mtcars$cyl %in% refDF$ID,]

rg255
- 4,119
- 3
- 22
- 40
0
We can also do
library(dplyr)
mtcars %>%
filter(!cyl %in% refDF$ID)
data
refDF <- data.frame("ID" = c(4,6))

akrun
- 874,273
- 37
- 540
- 662