6

let's create example data:

df <- data.frame(date=c("2017-01-01","2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"), X1=c("A", "B", "C", "D", "F"),
                 X2=c("B", "A", "D", "F", "C"))
df2 <- data.frame(date=c("2017-01-01","2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"), 
                  A=c("3", "4", "2", "1", "5"),
                  B=c("6", "2", "5", "1", "1"),
                  C=c("1", "4", "5", "2", "3"),
                  D=c("67", "67", "63", "61", "62"),
                  F=c("31", "33", "35", "31", "38"))

So I have two data frames and I want to match values from df2 to df by date and X1 and X2 and create new variables for those. What makes this tricky for me is that matched values in df2 are in colnames. End result should look like this:

> result
        date X1 X2 Var1 Var2
1 2017-01-01  A  B    3    6
2 2017-01-02  B  A    2    4
3 2017-01-03  C  D    5   63
4 2017-01-04  D  F   61   31
5 2017-01-05  F  C   38    3

result <- data.frame(date=c("2017-01-01","2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"), 
                     X1=c("A", "B", "C", "D", "F"),
                     X2=c("B", "A", "D", "F", "C"),
                     Var1=c("3", "2", "5", "61", "38"),
                     Var2=c("6", "4", "63", "31", "3"))

I wanted to use mapvalues, but couldn't figure it out. Second thought was to go long format (melt) with df2 and try then, but failed there as well.

Ok, here is my best try, just feels that there could be more efficient way, if you have to create multiple (>50) new variables to data frame.

df2.long <- melt(df2, id.vars = c("date"))

df$Var1 <- na.omit(merge(df, df2.long, by.x = c("date", "X1"), by.y = c("date", "variable"), all.x = FALSE, all.y = TRUE))[,4]
df$Var2 <- na.omit(merge(df, df2.long, by.x = c("date", "X2"), by.y = c("date", "variable"), all.x = FALSE, all.y = TRUE))[,5]
Hakki
  • 1,440
  • 12
  • 26

5 Answers5

4

Using dplyr and tidyr:

df2_m <- group_by(df2, date) %>% 
    gather('X1', 'var', -date)

left_join(df, df2_m) %>% 
    left_join(df2_m, by = c('date', 'X2' = 'X1')) %>%
    rename(Var1 = var.x, Var2 = var.y) -> result
GGamba
  • 13,140
  • 3
  • 38
  • 47
3

A possibility with mapply:

df$Var1 <- mapply(function(day, col) df2[df2$date==day, as.character(col)], 
                  day=df$date, col=df$X1)
df$Var2 <- mapply(function(day, col) df2[df2$date==day, as.character(col)], 
                  day=df$date, col=df$X2)

df
#        date X1 X2 Var1 Var2
#1 2017-01-01  A  B    3    6
#2 2017-01-02  B  A    2    4
#3 2017-01-03  C  D    5   63
#4 2017-01-04  D  F   61   31
#5 2017-01-05  F  C   38    3

NB:
If you have more columns to modify (not just 2 like in your example), you can use lapply to loop over the columns X.:

df[, paste0("Var", 1:2)] <- lapply(df[,paste0("X", 1:2)], 
                                   function(value) {
                                      mapply(function(day, col) df2[df2$date==day, as.character(col)], 
                                             day=df$date, col=value)})
Cath
  • 23,906
  • 5
  • 52
  • 86
  • 1
    yeah, thanks. Got headache already, as my original data was `"data.table" "data.frame"`, so function was returning only X1 to new variable, but transforming it to only data frame corrected that. I don't have ability to understand why. – Hakki Feb 16 '17 at 15:30
  • @Viitama if you're actually working with `data.table`s, David's answer could be more appropriate. – Cath Feb 16 '17 at 15:32
  • I'm noob, I don't when it changes from one to another. If there is no match, what should I do to get NA instead of `numeric(0)`. I can always correct them later, or build a function from that, but is there something I can do inside mapply? – Hakki Feb 16 '17 at 15:35
  • you can check if the value is among the colnames and if not return NA – Cath Feb 16 '17 at 15:42
  • 1
    Yeah, `as.numeric` in front of your code will solve this. Now it returns list inside data.frame column?? but thanks, I got it working! – Hakki Feb 16 '17 at 16:04
3

An double melt > join > dcast option using data.table

library(data.table) # v>=1.10.0
dcast(
  melt(setDT(df), 1L)[ # melt the first table by date
    melt(setDT(df2), 1L),  # melt the second table by date
    on = .(date, value = variable), # join by date and the letters
    nomatch = 0L], # remove everything that wasn't matched
  date ~ variable, # convert back to long format
  value.var = c("value", "i.value")) # take both values columns

#          date value_X1 value_X2 i.value_X1 i.value_X2
# 1: 2017-01-01        A        B          3          6
# 2: 2017-01-02        B        A          2          4
# 3: 2017-01-03        C        D          5         63
# 4: 2017-01-04        D        F         61         31
# 5: 2017-01-05        F        C         38          3
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
2

We can use match to get the column index of 'df2' from the 'X1' and 'X2' columns, cbind with the sequence of rows, use the row/column index to extract the values in 'df2', and assign the output to create the 'Var' columns

df[paste0("Var", 1:2)] <-  lapply(df[2:3], function(x)
          df2[-1][cbind(1:nrow(df2), match(x, names(df2)[-1]))])
df
#        date X1 X2 Var1 Var2
#1 2017-01-01  A  B    3    6
#2 2017-01-02  B  A    2    4
#3 2017-01-03  C  D    5   63
#4 2017-01-04  D  F   61   31
#5 2017-01-05  F  C   38    3
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    To account for not sequentially matching dates, I guess, it could be `i = match(df$date, df2$date); ...cbind(i, match(..))..` (instead of `1:nrow`) – alexis_laz Feb 16 '17 at 17:49
1

Using melt and match:

df2l<-melt(df2, measure=c("A","B","C","D","F"))
Indices <- match(paste(df$date, df$X1), paste(df2l$date,df2l$variable))
df$Var1 <- df2l$value[Indices]
Indices2 <- match(paste(df$date, df$X2), paste(df2l$date,df2l$variable))
df$Var2 <- df2l$value[Indices2]
user3640617
  • 1,546
  • 13
  • 21