0

Here I share with you a simplified version of my issue. Say I have 6 observations (pid) for two variables:

    pid <- c(1,2,3,4,5,6)
    V1 <- c(11,11,33,11,22,33)
    V2 <- c("A", "C", "M", "M", "A", "A")
    data <- data.frame(pid, V1, V2)
# pid V1 V2
# 1   1 11  A
# 2   2 11  C
# 3   3 33  M
# 4   4 11  M
# 5   5 22  A
# 6   6 33  A

I would like to create a new column based on the values associated to the different combinations I have of V1 and V2, that stored in a second database:

V1 <- c(11,11,11,22,22,22,33,33,33)
V2 <- c("A", "C", "M","A", "C", "M","A", "C", "M")
valueA <- c(16,26,36,46,56,66,76,86,96)
valueB <- c(15,25,35,45,55,65,75,85,95)
values <- data.frame(V1, V2, valueA, valueB)
# V1 V2 valueA valueB
# 1 11  A     16     15
# 2 11  C     26     25
# 3 11  M     36     35
# 4 22  A     46     45
# 5 22  C     56     55
# 6 22  M     66     65
# 7 33  A     76     75
# 8 33  C     86     85
# 9 33  M     96     95

I tried this, following @akrun suggestion:

data <- mutate (data, 
                valueA = as.integer (ifelse(data$V1 %in% values$V1
                                            & data$V2 %in% values$V2, values$valueA, NA))
                )

But the result is the following:

# pid V1 V2 valueA
# 1   1 11  A     16
# 2   2 11  C     26
# 3   3 33  M     36
# 4   4 11  M     46
# 5   5 22  A     56
# 6   6 33  A     66

As you can see, the combination 33 M is 36 while it should be 96...

I would like to archive this:

#   pid V1 V2 valueA
# 1   1 11  A     16
# 2   2 11  C     26
# 3   3 33  M     96
# 4   4 11  M     36
# 5   5 22  A     46
# 6   6 33  A     76

any suggestions on how to fix this? Any help would me much appreciated!

Nicole
  • 23
  • 3
  • Perhaps you want `%in%` instead of `==` i.e. `mutate (data, valueA = as.integer (ifelse(data$V1 %in% values$V1 & data$V2 %in% values$V2, values$valueA, NA)) )` or it is a simple join operation i.e. `left_join(data, values)` – akrun Oct 16 '20 at 23:49
  • @akrun thanks for your reply! Unfortunately, the code still doesn't work as I wish... please see the edit in the question for a full explanation – Nicole Oct 17 '20 at 08:00

1 Answers1

0

I solved the issue above creating a single column merging V1 and V2 as follows:

data$unique  <- paste(data$V1,data$V2)
values$unique <- paste(values$V1, values$V2)

and then merged by the new column:

merge(x = data, y = values, by = "unique")
# unique pid V1.x V2.x V1.y V2.y valueA valueB
# 1   11 A   1   11    A   11    A     16     15
# 2   11 C   2   11    C   11    C     26     25
# 3   11 M   4   11    M   11    M     36     35
# 4   22 A   5   22    A   22    A     46     45
# 5   33 A   6   33    A   33    A     76     75
# 6   33 M   3   33    M   33    M     96     95
Nicole
  • 23
  • 3