0

I have two data frames that I would like to match similar columns in both data frames and extract the value of another column in one of the data tables.

I have tried the below code but it doesn`t work:

Code:

output <- if (Data_1$Var1 %in% Data_2$Coor) {
      Data_2$binID
      } else if (Data_1$Var2 %in% Data_2$Coor) {
        Data_2$ID
      } else {NA}

Data_1:

                  Var1                    Var2       value
 chr22:17400000-17410000 chr22:16080000-16090000  139.939677
 chr22:17400000-17410000 chr22:26080000-26090000  256.945265
 chr22:33470000-33480000 chr22:16080000-16090000  134.432441

Data_2:

                    coor  ID
 chr22:17400000-17410000  1
 chr22:33470000-33480000  2
 chr22:16080000-16090000  3
 chr22:26080000-26090000  4

Output:

                     ID1                    ID2       value
                       1                      3   139.939677
                       1                      4   256.945265
                       2                      3   134.432441
star
  • 743
  • 1
  • 7
  • 19

2 Answers2

1

Here is something:

id_lookup <- setNames(Data_2$ID, Data_2$coor)
as.data.frame(c(lapply(Data_1[1:2], function(x) unname(id_lookup[x])), value = list(Data_1$value)))
  Var1 Var2    value
1    1    3 139.9397
2    1    4 256.9453
3    2    3 134.4324

Data

Data_1 <- data.frame(
  Var1 = c("chr22:17400000-17410000", "chr22:17400000-17410000", "chr22:33470000-33480000"), 
  Var2 = c("chr22:16080000-16090000", "chr22:26080000-26090000", "chr22:16080000-16090000"), 
  value = c(139.939677, 256.945265, 134.432441)
) 

Data_2 <- data.frame(
  coor = c(
    "chr22:17400000-17410000", "chr22:33470000-33480000", 
    "chr22:16080000-16090000", "chr22:26080000-26090000"
  ), 
  ID = 1:4
)
s_baldur
  • 29,441
  • 4
  • 36
  • 69
0

You can merge and match if you have only two columns.

transform(merge(data_1, data_2, by.x = 'Var1', by.y = 'coor'), 
          ID2 = match(Var2, data_2$coor))

#                     Var1                    Var2    value ID ID2
#1 chr22:17400000-17410000 chr22:16080000-16090000 139.9397  1   3
#2 chr22:17400000-17410000 chr22:26080000-26090000 256.9453  1   4
#3 chr22:33470000-33480000 chr22:16080000-16090000 134.4324  2   3

You can then keep only the columns that you are interested in.


A general solution if you have Var1, Var2, Varn is to get data in long format, join and get it back in wide format.

library(dplyr)
library(tidyr)

data_1 %>%
  pivot_longer(cols = starts_with('Var'), values_to = 'coor') %>%
  inner_join(data_2, by = 'coor') %>%
  pivot_wider(names_from = name, values_from = c(coor, ID))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213