0

I am trying to create a new column, that looks into whether the column 1 in dataframe A matches to column 1 in dataframe B, and if is there a match to have the value in column output in dataframe B. If column 1 in dataframe A does not a find a match in column 1 in dataframe B. To look at whether there is match between column 2 in dataframe A and column 2 dataframe B, and if there is a match to have the output 2 from dataframe B. If there is no matches then for it to be zero.

This is my attempt, I think it semi works but it gives me a lot of NAs, a lot of these NAs are false because there is match in columns between the dataframes but it still comes up as NA. Any help would be appreciated.

 dataframeA<-dataframeA %>% mutate(newcolumn=ifelse(column1 %in% dataframeB$column1, dataframeB$output, ifelse(column2 %in% dataframeB$column2, dataframeB$output2,NA)))

I tried to use case_when but this doesn't work because I want the output value to be from a column.

EDIT here is some sample data from the help of @bloxx

dataframeA<- data.frame(Column1 = c("a", "b", "c", "f"),
                    Column2 = c("a", "a", "a", "f"))
dataframeB<- data.frame(Column1 = c("b", "b", "c", "e", "f", "g", "h"),
                    Column2 = c("a", "b", "a", "e"),
                    Output = c("B", "B", "B", "B","B","B","B"),
                    Output2 = c("D", "D", "D", "D"))
dataframeA <- dataframeA %>% 
mutate(new_variable = ifelse(Column1 %in% dataframeB$Column1, 
dataframeB$Output,
ifelse(Column2 %in% 
dataframeB$Column2,dataframeB$Output2, 0 )))

In my dataset, dataframeB has different lengths, but even if make it the same length, there is a lot of NAs.

Thank you

newbie
  • 21
  • 5
  • Please share a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a small example of your data used. – Martin Gal Jul 21 '21 at 08:26
  • 1
    This question will (most likely) improve and get better answers if you provide sample data and a desired output. – Wimpel Jul 21 '21 at 08:26
  • 2
    Your question sounds more like a join than an ifelse-problem to me. Take a look at dplyr's `left_join` and `coalesce` functions. – Martin Gal Jul 21 '21 at 08:30

2 Answers2

1

I used a for loop to do so. But there's probably a more efficient and elegant solution using purrr. Anyway:

Data:

df1 <- data.frame(value1 = c(1,2,3,4,5),
                  value2 = c(2,5,8,4,6))

df2 <- data.frame(value1 = c(1,2,3,8,5),
                  value2 = c(2,9,8,4,7),
                  output1 = c(4,4,4,4,4),
                  output2 = c(8,9,6,4,2))

Code:

for(i in 1:2){
  if(i ==1) {out <- df1}
  out <- left_join(out, df2 %>%
              select(str_subset(names(df2),paste0(i,"$"))),
              by = str_subset(str_subset(names(out),paste0(i,"$")), names(df2)))
}

out <- out %>% 
  mutate(across(everything(), ~ifelse(is.na(.),0,.)))

Output:

  value1 value2 output1 output2
1      1      2       4       8
2      2      5       4       0
3      3      8       4       6
4      4      4       0       4
5      5      6       4       0

You can change the indices in the for loop parameters to the number of value you want to merge the dataframes.

MonJeanJean
  • 2,876
  • 1
  • 4
  • 20
1

You can also do it with ifelse.


dataframeA<- data.frame(Column1 = c("a", "b", "c", "f"),
                        Column2 = c("a", "a", "a", "f"))
dataframeB<- data.frame(Column1 = c("b", "b", "c", "e"),
                        Column2 = c("a", "b", "a", "e"),
                        Output = c("B", "B", "B", "B"),
                        Output2 = c("D", "D", "D", "D"))
dataframeA <- dataframeA %>% 
  mutate(new_variable = ifelse(Column1 %in% dataframeB$Column1, dataframeB$Output,
                               ifelse(Column2 %in% dataframeB$Column2,dataframeB$Output2, 0 )))

UPDATE

dataframeA<- data.frame(Column1 = c("a", "b", "c", "f", "g"),
                        Column2 = c("a", "a", "a", "f", "g"))
dataframeB<- data.frame(Column1 = c("b", "b", "c", "e", "g"),
                        Column2 = c("a", "b", "a", "e", "g"),
                        Output = c("B", "B", "B", "B", "B"),
                        Output2 = c("D", "D", "D", "D", "D"))

dataframeA <- dataframeA %>% 
  mutate(new_variable = ifelse(Column1 %in% dataframeB$Column1 & Column2 %in% dataframeB$Column2, dataframeB$Output,
                               ifelse(Column2 %in% dataframeB$Column2,dataframeB$Output2, 
                                      ifelse(Column1 %in% dataframeB$Column1, dataframeB$Output, NA))))
Bloxx
  • 1,495
  • 1
  • 9
  • 21
  • Thanks, this is the closest to what I was asking. However, it does not work, it seems that if there is matches for both column 1 and column 2 and it returns NA. I am gonna use your example in my question :) – newbie Jul 21 '21 at 09:20
  • Check the update. the fourth row has no match so NA, and the fifth has a match in both and is output1. – Bloxx Jul 21 '21 at 09:49