0

I have two dataframes of single column gene lists that have some overlapping/duplicated values. I am looking to basically find genes that are only present in DF1 and not in both DF1 and DF2.

**DF1**     **DF2**
ABCD1        ABCD1
ACSL4        ACTC1
ACVRL1        ADNP
AFF2          AFF2

How do I merge these 2 such that the result will only give me non-overlapping results from DF1 in a new Dataframe such that:

**DF3**
ACSL4
ACVRL1

So far I have just used rbind to combine DF1 and DF2 and then try to get rid of duplicates using

DF3 <- rbind(DF1, DF2)
DF3[!duplicated(DF3) | duplicated(DF3, fromLast=TRUE)),, drop = FALSE]

This seems to work okay but are there better ways to do this?

2 Answers2

0

I believe anti_join from {dplyr} is what you are looking for

library(dplyr)

df1 <- tribble(
  ~var, 
  "ABCD1",
  "ACSL4",
  "ACVRL1",
  "AFF2"
)

df2 <- tribble(
  ~var, 
  "ABCD1",
  "ACTC1",
  "ADNP",
  "AFF2"
)

(df3 <- anti_join(df1, df2))
#> Joining, by = "var"
#> # A tibble: 2 x 1
#>   var   
#>   <chr> 
#> 1 ACSL4 
#> 2 ACVRL1

Created on 2021-06-13 by the reprex package (v2.0.0)

Marcelo Avila
  • 2,314
  • 1
  • 14
  • 22
0

Actually you want to exclude those of DF1 appearing in DF2.

DF1[!DF1$V1 %in% DF2$V1,,F]
#       V1
# 2  ACSL4
# 3 ACVRL1
jay.sf
  • 60,139
  • 8
  • 53
  • 110