-4

DATA FRAME 1: HOUSE PRICE

year    month   MSA1    MSA2    MSA3
2000    1       12  6   7
2000    2       1   3   4
2001    3       9   5   7

DATA FRAME 2: MORTGAGE INFO

ID  MSA YEAR    MONTH   
1   MSA1    2000    2   
2   MSA3    2001    3   
3   MSA2    2001    3   
4   MSA1    2000    1   
5   MSA3    2000    3   

OUTCOME DESIRED:

ID  MSA YEAR    MONTH   HOUSE_PRICE
1   MSA1    2000    2   1
2   MSA3    2001    3   7
3   MSA2    2001    3   5

Anyone knows how to achieve this in an efficient way? data frame 2 is huge and data frame 1 is ok size. Thanks!

Jaap
  • 81,064
  • 34
  • 182
  • 193
Jenny Shu
  • 47
  • 2
  • 4
  • 12
  • 3
    StackOverflow is not a coding writing service. We help troubleshoot earnest efforts. Please make an attempt and let us know if you run into any issues. – polka Jan 23 '16 at 19:21
  • It is unclear what you're trying to do. First data frame can be coerced into the desired outcome, so what's in df2 that you need the merge for? – mtoto Jan 23 '16 at 19:27
  • @polka I would need to do something like "merge(df1, df2, by = c(year, month), where df2$MSA = the corresponding MSA in df1)" The problem is two fold: first I do not know how to write this syntax, second, the MSA in df1 are the headers, I don't know how to use "value = header" type of comparison.... – Jenny Shu Jan 23 '16 at 20:59
  • @mtoto the entire purpose is to add house price value in df1 into df2. df2 is the main data. I need to add an additional column of house price into df2. Thanks. – Jenny Shu Jan 23 '16 at 21:04

2 Answers2

1

Assuming both are data.tables dt1 and dt2, this can be done without having to cast them to long form as follows:

require(data.table)
dt2[dt1, .(ID, MSA, House_price = get(MSA)), by=.EACHI, 
           nomatch=0L, on=c(YEAR="year", MONTH="month")]
#    YEAR MONTH ID  MSA House_price
# 1: 2000     1  4 MSA1          12
# 2: 2000     2  1 MSA1           1
# 3: 2001     3  2 MSA3           7
# 4: 2001     3  3 MSA2           5

dt1 = fread('year    month   MSA1    MSA2    MSA3
2000    1       12  6   7
            2000    2       1   3   4
            2001    3       9   5   7
            ')

dt2 = fread('ID  MSA YEAR    MONTH   
1   MSA1    2000    2   
            2   MSA3    2001    3   
            3   MSA2    2001    3   
            4   MSA1    2000    1   
            5   MSA3    2000    3   
            ')
Arun
  • 116,683
  • 26
  • 284
  • 387
0

This looks like a case of turning a data frame from wide to long form and then merging two data frames. Here is a dplyr solution with gather and right_join. The name change is just here to make the join easier.

library(dplyr)
library(tidyr)
names(df1) <- toupper(names(df1))
gather(df1,MSA,HOUSE_PRICE,-YEAR,-MONTH) %>% 
  right_join(df2,by = c("YEAR","MONTH","MSA"))

output

  YEAR MONTH  MSA HOUSE_PRICE ID
1 2000     2 MSA1           1  1
2 2001     3 MSA3           7  2
3 2001     3 MSA2           5  3
4 2000     1 MSA1          12  4
5 2000     3 MSA3          NA  5
Community
  • 1
  • 1
scoa
  • 19,359
  • 5
  • 65
  • 80