0

I have one data frame with 332 names and another with 56000. All of the 332 names are included in the larger data frame. How do I remove rows of data from the large data frame if the names are included in the smaller data frame?

  • Welcome to SO! Can your post a minimal reproducible example? See: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – markus Apr 16 '20 at 22:03
  • We're going to need what data structure they're stored in (vector, data frame, data table, tibble etc.). You can find this out with the `class()` function. – Daniel V Apr 16 '20 at 22:07

2 Answers2

0

Using the built in mtcars dataset in place of your large dataset, use the %in% operator to subset to those in a reference data frame (your smaller one) and ! to make it "not in". Change dataframe names and variables to suit your need.

# SETUP
refDF <- data.frame("ID" = c(4,6))
# SOLUTION 
mtcars[!mtcars$cyl %in% refDF$ID,]
rg255
  • 4,119
  • 3
  • 22
  • 40
0

We can also do

library(dplyr)
mtcars %>%
   filter(!cyl %in% refDF$ID)

data

refDF <- data.frame("ID" = c(4,6))
akrun
  • 874,273
  • 37
  • 540
  • 662