0

Questions similar to this one have been asked before, but solutions to those questions didn't quite work for my purposes. Suppose I have two different dataframes:

A <- c("Gene1", "Gene3", "Gene4", "Gene9")
B <- c(-10, -4, 4, 19)
df1 <- data.frame(A, B)
head(df1)

      A   B
1 Gene1 -10
2 Gene3  -4
3 Gene4   4
4 Gene9  19

A <- c("Gene3", "Gene4", "Gene7", "Gene9")
B <- c(5, 2, 9, 11)
df2 <- data.frame(A, B)
head(df2)

      A  B
1 Gene3  5
2 Gene4  2
3 Gene7  9
4 Gene9 11

Now I would like to create a new dataframe containing values for Column A that both df1 and df2 have similar (such as Gene3, Gene4 and Gene9), as well as BOTH values for the B column from df1 and df2, like so:

A <- c("Gene3", "Gene4", "Gene9")
B_df1 <- c(-4, 4, 19)
B_df2 <- c(5, 2, 11)
dfcombo <- data.frame(A, B_df1, B_df2)
head(dfcombo)

      A B_df1 B_df2
1 Gene3    -4     5
2 Gene4     4     2
3 Gene9    19    11

Then I would also like a another dataframe containing only the rows that have unique values for column A for df1, and one that only contains the rows with a unique column A value for df2.

#the final printout for this example should look like this:
head(df1_unique)

      A   B
1 Gene1 -10

head(df2_unique)

      A B
1 Gene7 9

So in the end I should end up with 3 dataframes. Thanks!

pleasehelp
  • 87
  • 6

2 Answers2

1

Base R

merge(df1, df2, by = "A", suffixes = c("_df1", "_df2"))
#       A B_df1 B_df2
# 1 Gene3    -4     5
# 2 Gene4     4     2
# 3 Gene9    19    11

tidyverse

library(dplyr)
inner_join(df1, df2, by = "A", suffix = c("_df1", "_df2"))
#       A B_df1 B_df2
# 1 Gene3    -4     5
# 2 Gene4     4     2
# 3 Gene9    19    11
r2evans
  • 141,215
  • 6
  • 77
  • 149
1

tidyverse

library(tidyverse)
A <- c("Gene1", "Gene3", "Gene4", "Gene9")
B <- c(-10, -4, 4, 19)
df1 <- data.frame(A, B)
head(df1)
#>       A   B
#> 1 Gene1 -10
#> 2 Gene3  -4
#> 3 Gene4   4
#> 4 Gene9  19

A <- c("Gene3", "Gene4", "Gene7", "Gene9")
B <- c(5, 2, 9, 11)
df2 <- data.frame(A, B)
head(df2)
#>       A  B
#> 1 Gene3  5
#> 2 Gene4  2
#> 3 Gene7  9
#> 4 Gene9 11

uniq_df1 <- anti_join(df1, df2, by = "A")
uniq_df1
#>       A   B
#> 1 Gene1 -10

uniq_df2 <- anti_join(df2, df1, by = "A")
uniq_df2
#>       A B
#> 1 Gene7 9

Created on 2021-02-02 by the reprex package (v1.0.0)

Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14