Hello everyone I would need help in order to merge two dataframe.
So I have a dataframe such as
df1
SP_names Groups Number
1 Canis_lupus,Homo_sapiens,Mus_musculus 3_Group2,5_Group8 Number1
2 Canis_lupus,Cattus_cattus 6_Group37,8_Group89 Number2
3 Apis_mellifera 1_Group673 Number3
and
df2
SP_names Groups
1 Canis_lupus 3_Group2
2 Canis_lupus 5_Group8
3 Homo_sapiens 5_Group8
4 Canis_lupus 6_Group37
5 Mus_musculus 3_Group2
6 Cattus_cattus 6_Group37
7 Cattus_cattus 8_Group89
8 Apis_mellifera 1_Group673
and the idea is to add into df2 the column Number
from df1 by groupping Groups
and SP_names
.
The difficulty is that within df1$Groups
and df1$SP_names
I can have mutliple values separated by a ,
.
So the expected output would be :
SP_names Groups Number
1 Canis_lupus 3_Group2 Number1
2 Canis_lupus 5_Group8 Number1
3 Homo_sapiens 5_Group8 Number1
4 Canis_lupus 6_Group37 Number2
5 Mus_musculus 3_Group2 Number1
6 Cattus_cattus 6_Group37 Number2
7 Cattus_cattus _Group89 Number2
8 Apis_mellifera 1_Group673 Number3
where for instance Canis lupuse was present in df1$Number
Number1
and Number2
, si I juste checked the match between df1$Groups/df2$Groups
and df1$SP_names/df2$SP_names
to assign the Number value in the right place in df2.
Does someone have an idea please
Here are the data
df1
structure(list(SP_names = structure(3:1, .Label = c("Apis_mellifera",
"Canis_lupus,Cattus_cattus", "Canis_lupus,Homo_sapiens,Mus_musculus"
), class = "factor"), Groups = structure(c(2L, 3L, 1L), .Label = c("1_Group673",
"3_Group2,5_Group8", "6_Group37,8_Group89"), class = "factor"),
Number = structure(1:3, .Label = c("Number1", "Number2",
"Number3"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
df2
structure(list(SP_names = structure(c(2L, 2L, 4L, 2L, 5L, 3L,
3L, 1L), .Label = c("Apis_mellifera", "Canis_lupus", "Cattus_cattus",
"Homo_sapiens", "Mus_musculus"), class = "factor"), Groups = structure(c(2L,
3L, 3L, 4L, 2L, 4L, 5L, 1L), .Label = c("1_Group673", "3_Group2",
"5_Group8", "6_Group37", "8_Group89"), class = "factor")), class = "data.frame", row.names = c(NA,
-8L))