1

I have 2 different dataset. one of them (df1) is a subset of another one (df2). there is a group called SAMPN. I want to remove all group elements in larger data set if there is at least one row from that group in smaller data set.

df2

     SAMPN     PERNO     
       1         1
       1         2
       1         3
       2         1
       2         3
       3         3
       3         4
       3         5
       4         1
       4         3    

df1

     SAMPN     PERNO     
       1         1
       2         1
       2         3

output

     SAMPN     PERNO     
       3         3
       3         4
       3         5
       4         1
       4         3 

data:

df1:

structure(list(SAMPN = c("   11", "   18", "   27", "   33", 
"   33", "   39"), PERNO = structure(c(1L, 1L, 1L, 1L, 2L, 4L
), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

df2:

structure(list(SAMPN = c(10, 10, 10, 11, 11, 11, 11, 12, 12, 
12, 12), PERNO = c(2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2)), row.names = 90:100, class = "data.frame")
  • when you say 'Group' do you mean column? Or row? – morgan121 Sep 24 '19 at 22:35
  • of course column. it is called SAMPN –  Sep 24 '19 at 22:36
  • Possible duplicate of [Find complement of a data frame (anti - join)](https://stackoverflow.com/questions/28702960/find-complement-of-a-data-frame-anti-join) – camille Sep 25 '19 at 02:57
  • In df2, you've got the column as numeric. In df1, you have it as a string with a bunch of blank spaces. Is that intentional? Or did you just paste it into the question incorrectly? – camille Sep 25 '19 at 02:58

1 Answers1

0

You can filter all rows where SAMPNis not in df2$SAMPNusing this code:

library(tidyverse)

df1 <- df1 %>% mutate(SAMPN=as.numeric(SAMPN)) %>% 
               filter(!(SAMPN %in% df2$SAMPN))
fmarm
  • 4,209
  • 1
  • 17
  • 29