0

Hello everyone I would need help in order to merge two dataframe.

So I have a dataframe such as

df1

                               SP_names              Groups  Number
1 Canis_lupus,Homo_sapiens,Mus_musculus   3_Group2,5_Group8 Number1
2             Canis_lupus,Cattus_cattus 6_Group37,8_Group89 Number2
3                        Apis_mellifera          1_Group673 Number3

and

df2

     SP_names     Groups
1    Canis_lupus   3_Group2
2    Canis_lupus   5_Group8
3   Homo_sapiens   5_Group8
4    Canis_lupus  6_Group37
5   Mus_musculus   3_Group2
6  Cattus_cattus  6_Group37
7  Cattus_cattus   8_Group89
8 Apis_mellifera 1_Group673

and the idea is to add into df2 the column Number from df1 by groupping Groups and SP_names.

The difficulty is that within df1$Groups and df1$SP_names I can have mutliple values separated by a ,.

So the expected output would be :

        SP_names     Groups Number
1    Canis_lupus   3_Group2 Number1
2    Canis_lupus   5_Group8 Number1
3   Homo_sapiens   5_Group8 Number1
4    Canis_lupus  6_Group37 Number2
5   Mus_musculus   3_Group2 Number1
6  Cattus_cattus  6_Group37 Number2
7  Cattus_cattus   _Group89 Number2
8 Apis_mellifera 1_Group673 Number3

where for instance Canis lupuse was present in df1$Number Number1 and Number2, si I juste checked the match between df1$Groups/df2$Groups and df1$SP_names/df2$SP_namesto assign the Number value in the right place in df2.

Does someone have an idea please

Here are the data

df1

structure(list(SP_names = structure(3:1, .Label = c("Apis_mellifera", 
"Canis_lupus,Cattus_cattus", "Canis_lupus,Homo_sapiens,Mus_musculus"
), class = "factor"), Groups = structure(c(2L, 3L, 1L), .Label = c("1_Group673", 
"3_Group2,5_Group8", "6_Group37,8_Group89"), class = "factor"), 
    Number = structure(1:3, .Label = c("Number1", "Number2", 
    "Number3"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))

df2

structure(list(SP_names = structure(c(2L, 2L, 4L, 2L, 5L, 3L, 
3L, 1L), .Label = c("Apis_mellifera", "Canis_lupus", "Cattus_cattus", 
"Homo_sapiens", "Mus_musculus"), class = "factor"), Groups = structure(c(2L, 
3L, 3L, 4L, 2L, 4L, 5L, 1L), .Label = c("1_Group673", "3_Group2", 
"5_Group8", "6_Group37", "8_Group89"), class = "factor")), class = "data.frame", row.names = c(NA, 
-8L))
chippycentra
  • 3,396
  • 1
  • 6
  • 24
  • Seems like you may start here to tidy your data: [Split comma-separated strings in a column into separate rows](https://stackoverflow.com/questions/13773770/split-comma-separated-strings-in-a-column-into-separate-rows), then a join? – Henrik Mar 22 '21 at 14:46

1 Answers1

1

I think something like this should work.. but what about row 7 from you output? _Group89 does nog exist in df1. typo?

library( tidyverse )
df1_long <- df1 %>% 
  tidyr::separate_rows( SP_names, sep = "," ) %>% 
  tidyr::separate_rows( Groups, sep = "," )

df2 %>% dplyr::left_join( df1_long )
Wimpel
  • 26,031
  • 1
  • 20
  • 37