I want to find the common rows between 2 dataframe. To find the common rows, I can use inner_join(), semi_join()
, and merge()
. I have gone through different posts including this. But, these operations are not fulfilling my purposes. Because my data in the dataframe is a little different!
Sometimes, the data in the dataframe can be vise versa. Like the 3rd and 5th rows
of dataframe-1 and dataframe-2
. Dataframe-1 contains A3 A1 0.75
but Dataframe-2 contains A1 A3 0.75
. I would like to take these 2 rows as the same.
My first dataframe looks like
query target weight
1 A1 A2 0.60
2 A2 A5 0.50
3 A3 A1 0.75
4 A4 A5 0.88
5 A5 A3 0.99
6 (+)-1(10),4-Cadinadiene Falcarinone-10 0.09
7 Leucodelphinidin-100 (+)-1(10),4-Cadinadiene 0.876
8 Lignin (2E,7R,11R)-2-Phyten-1-ol 0.778
9 (2E,7R,11R)-2-Phyten-1-ol Leucodelphinidin 0.55
10 Falcarinone Lignin 1
11 A1 (+)-1(10),4-Cadinadiene 1
12 A2 Lignin-10 1
13 A3 (2E,7R,11R)-2-Phyten-1-ol 1
14 Falcarinone A6 1
15 A4 Leucodelphinidin 1
16 A4 Leucodelphinidin 1
17 Falcarinone A100 1
18 A4 Falcarinone 1
the second dataframe looks like
query target
1 A1 A2
2 A2 A5
3 A1 A3 // Missing in the output
4 A4 A5
5 A3 A5 // Missing in the output
6 A3 (2E,7R,11R)-2-Phyten-1-ol
7 (+)-1(10),4-Cadinadiene Falcarinone
8 Leucodelphinidin (+)-1(10),4-Cadinadiene-100
9 Lignin-2 (2E,7R,11R)-2-Phyten-1-ol
10 A11 (+)-1(10),4-Cadinadiene
11 A2 Lignin
12 A3 (2E,7R,11R)-2-Phyten-1-0l
13 Falcarinone A60
14 A4 Leucodelphinidin // Missing in the output
The code I am using
output <- semi_join(Dataframe-1, Dataframe-2)
OR
output <- inner_join(df_only_dd, sample_data_dd_interaction)
The output I am getting
query target weight
1 A1 A2 0.60
2 A2 A5 0.50
But, my expected output is like this
query target weight
1 A1 A2 0.60
2 A2 A5 0.50
3 A3 A1 0.75
4 A4 A5 0.88
5 A5 A3 0.99
6 A4 Leucodelphinidin 1
Reproducible code is given below
df_1 <- read.table(text="query target weight
A1 A2 0.6
A2 A5 0.5
A3 A1 0.75
A4 A5 0.88
A5 A3 0.99
(+)-1(10),4-Cadinadiene Falcarinone 0.09
Leucodelphinidin (+)-1(10),4-Cadinadiene 0.876
Lignin (2E,7R,11R)-2-Phyten-1-ol 0.778
(2E,7R,11R)-2-Phyten-1-ol Leucodelphinidin 0.55
Falcarinone Lignin 1
A1 (+)-1(10),4-Cadinadiene 1
A2 Lignin 1
A3 (2E,7R,11R)-2-Phyten-1-ol 1
Falcarinone A6 1
A4 Leucodelphinidin 1
A4 Leucodelphinidin 1
Falcarinone A100 1
A5 Falcarinone 1", header=TRUE)
df_2 <- read.table(text="query target
A1 A2
A2 A5
A1 A3
A4 A5
A3 A5
(+)-1(10),4-Cadinadiene Falcarinone
Leucodelphinidin (+)-1(10),4-Cadinadiene-100
Lignin-2 (2E,7R,11R)-2-Phyten-1-ol
A11 (+)-1(10),4-Cadinadiene
A2 Lignin
A3 (2E,7R,11R)-2-Phyten-1-0l
Falcarinone A6
A4 Leucodelphinidin ", header=TRUE)
Any kind of suggestion is appreciated.