3

I have a dataframe named df1 which has four columns (i.e. id, s, date and value). The value column is empty and I want to fill it using a second dataframe that is named df2. df2 is filled with id column and many other columns that are named using dates which they belong. All I need is to find corresponding values of df1$value in df2, where both dates and id numbers are matching.

Example data:

set.seed(123)

#df1
df1 <- data.frame(id = 1:100, 
                      s = runif(100,100,1000), 
                      date = sample(seq(as.Date('1999/01/01'), as.Date('2001/01/01'), by="day"), 100), 
                      value = NA)

#df2
df2 <- data.frame(matrix(runif(80000,1,100), ncol=800, nrow=100))[-1]
    names(df2) <- seq(as.Date("1999-01-01"),as.Date("2002-12-31"),1)[c(1:799)]  
    df2 <- cbind(id =  1:100, df2)
Majid
  • 1,836
  • 9
  • 19
  • The post revision reopened is https://stackoverflow.com/posts/56662593/revisions – akrun Jun 19 '19 at 08:24
  • Possible duplicate of https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format and https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right (commonly asked questions) – akrun Jun 19 '19 at 08:26
  • 3
    @akrun I don't think this is a duplicate because OP never asked for "convert to long format and join". However, all the answers until now fortunately/unfortunately offered the same solution. There could be answers to this post which does not require to convert to long format. IMHO, duplicate should be marked based on questions asked and not on answers posted hence, I reopened it. – Ronak Shah Jun 19 '19 at 08:33

3 Answers3

6

One way is to convert df2 into long format using gather and then do left_join

library(dplyr)
library(tidyr)

df1 %>%
  left_join(df2 %>% 
             gather(date, value, -id) %>% 
              mutate(date = as.Date(date)), by = c("id", "date"))

#     id   s       date value
#1     1 359 2000-03-15 48.32
#2     2 809 1999-09-01 62.16
#3     3 468 1999-12-23 16.41
#4     4 895 2000-11-26 32.70
#5     5 946 1999-12-18  5.84
#6     6 141 2000-10-09 74.65
#7     7 575 2000-10-25  9.22
#8     8 903 2000-03-17  6.46
#9     9 596 1999-10-25 73.48
#10   10 511 1999-04-17 62.43
#...

data

set.seed(123)
df1 <- data.frame(id = 1:100, 
              s = runif(100,100,1000), 
 date = sample(seq(as.Date('1999/01/01'), as.Date('2001/01/01'), by="day"), 100))


df2 <- data.frame(matrix(runif(80000,1,100), ncol=800, nrow=100))[-1]
names(df2) <- seq(as.Date("1999-01-01"),as.Date("2002-12-31"),1)[c(1:799)]  
df2 <- cbind(id =  1:100, df2)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
4

You can also use melt and then left join using both the keys:

library(dplyr)
library(reshape2)
set.seed(123)

#df1
df1 <- data.frame(id = 1:100, 
                  s = runif(100,100,1000), 
                  date = sample(seq(as.Date('1999/01/01'), as.Date('2001/01/01'), by="day"), 100), 
                  value = NA)
#df2
df2 <- data.frame(matrix(runif(80000,1,100), ncol=800, nrow=100))[-1]
names(df2) <- seq(as.Date("1999-01-01"),as.Date("2002-12-31"),1)[c(1:799)]  
df2 <- cbind(id =  1:100, df2)

df2<-melt(df2, id.vars = "id", value.name = "Value", variable.name = "date")

df2$date<-as.Date(df2$date, format = "%Y-%m-%d")
df1<-left_join(df1, df2, by = c("id", "date"))

head(df1)
  id        s       date value    Value
1  1 358.8198 2000-03-15    NA 48.31799
2  2 809.4746 1999-09-01    NA 62.15760
3  3 468.0792 1999-12-23    NA 16.41291
4  4 894.7157 2000-11-26    NA 32.70024
5  5 946.4206 1999-12-18    NA  5.83607
6  6 141.0008 2000-10-09    NA 74.64832
Harshal Gajare
  • 605
  • 4
  • 16
3

We can use efficient way with data.table join. It should be fast for big datasets

library(data.table)
setDT(df1)[melt(setDT(df2), id.var = 'id')[, 
       date := as.IDate(variable, '%Y-%m-%d')], on = .(id, date)]
akrun
  • 874,273
  • 37
  • 540
  • 662