Combining two dataframes based on presence of values of various different columns

Question

I have a question about creating new columns in my dataset by checking whether a value is present in one of the columns of my dataframe, and assigning the columns of a different dataframe based on that presence. As this description is quite vague, see the example dataset below:

newDf <- data.frame(c("Juice 1", "Juice 2", "Juice 3", "Juice 4","Juice 5"),
                    c("Banana", "Banana", "Orange", "Pear", "Apple"),
                    c("Blueberry", "Mango", "Rasberry", "Spinach", "Pear"),
                    c("Kale", NA, "Cherry", NA, "Peach"))
colnames(newDf) <- c("Juice", "Fruit 1", "Fruit 2", "Fruit 3")


dfChecklist <- data.frame(c("Banana", "Cherry"),
                          c("100", "80"),
                          c("5", "3"),
                          c("4", "5"))
colnames(dfChecklist) <- c("FruitCheck", "NutritionalValue", "Deliciousness", "Difficulty")

This gives the following dataframes:

    Juice Fruit 1   Fruit 2 Fruit 3
1 Juice 1  Banana Blueberry    Kale
2 Juice 2  Banana     Mango    <NA>
3 Juice 3  Orange  Rasberry  Cherry
4 Juice 4    Pear   Spinach    <NA>
5 Juice 5   Apple      Pear   Peach


  FruitCheck NutritionalValue Deliciousness Difficulty
1     Banana              100             5          4
2     Cherry               80             3          5

I want to combine the two and make the result to be like this:

   Juice Fruit 1   Fruit 2 Fruit 3 FruitCheck NutritionalValue Deliciousness Difficulty
1 Juice 1  Banana Blueberry    Kale     Banana              100             5          4
2 Juice 2  Banana     Mango    <NA>     Banana              100             5          4
3 Juice 3  Orange  Rasberry  Cherry     Cherry               80             3          5
4 Juice 4    Pear   Spinach    <NA>       <NA>             <NA>          <NA>       <NA>
5 Juice 5   Apple      Pear   Peach       <NA>             <NA>          <NA>       <NA>

The dataset above is an example, my own dataset is much larger and complexer.

Thanks so much in advance for your help!

[How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right/1300618#1300618) — user2974951, May 12 '22 at 10:47
Are you actually merging by fruits or are you just concatenating the two dataframes (i.e. merging by the row names)? — user2974951, May 12 '22 at 10:48
Thanks for your response! No I'm trying to merge by fruits. So for example, juice 1 contains the fruits banana, blueberry and kale. If one of those fruits is on the checklist, I want to append the data from that row from the checklist. So Banana is on the checklist, so I append the Fruitcheck banana, NutritionalValue of the Banana, Deliciousness, etc. In Juice 3, the Cherry is named as the third fruit, so it is not necessarily that the fruits I'm checking are all in one column. Does that make it more clear? — nbent, May 12 '22 at 10:51
OK... what if there is more than one match, say if juice 1 had banana as fruit 1 and cherry as fruit 2, what then? — user2974951, May 12 '22 at 10:59
This does not occur in my actual dataset, but if by exception it does occur I would like to just stick with the first match. — nbent, May 12 '22 at 11:07

score 1 · Accepted Answer · answered May 12 '22 at 11:17

First find the first match for each row

tmp=unlist(
  apply(
    newDf[,grepl("Fruit",colnames(newDf))],
    1,
    function(x){
      y=as.vector(x)
      y=y[which.min(match(y,dfChecklist$FruitCheck))]
      ifelse(length(y)==0,NA,y)
    }
  )
)

add this to your original df and then a simple merge

newDf$FruitCheck=tmp

merge(
  newDf,
  dfChecklist,
  by="FruitCheck",
  all.x=T
)

resulting in

  FruitCheck   Juice Fruit 1   Fruit 2 Fruit 3 NutritionalValue Deliciousness
1     Banana Juice 1  Banana Blueberry    Kale              100             5
2     Banana Juice 2  Banana     Mango    <NA>              100             5
3     Cherry Juice 3  Orange  Rasberry  Cherry               80             3
4       <NA> Juice 4    Pear   Spinach    <NA>             <NA>          <NA>
5       <NA> Juice 5   Apple      Pear   Peach             <NA>          <NA>
  Difficulty
1          4
2          4
3          5
4       <NA>
5       <NA>

Combining two dataframes based on presence of values of various different columns

1 Answers1