0

Joining two data.frames

data.table(df_del)
       KEY           place_Name
    1:  200039  BUFFALO/ROCHESTER   
    2:  200171  MILWAUKEE           
    3:  200197  PEORIA/SPRINGFLD.   
    4:  200233  OKLAHOMA CITY       
    5:  200272  LOS ANGELES      


data.table(df)
        firm_id brand_id   KEY UNITS DOLLARS       DATE
     1:     511      263  647840     1    7.29 2001-01-01
     2:     511      265  647840     2   14.58 2001-01-01
     3:     511      265  532733     1    6.39 2001-01-01
     4:      23      417  263939     1    4.79 2001-01-01
     5:      23      417  648768     5   24.45 2001-01-01

I am trying to join them by KEY but running into problems. The df file has approx 500,000 results and the df_del file has about 12,000.

The df_del file has the unique product key and it can be bought in the same city, so there may be 10 KEY values for one city(i.e. a product delivered 10 times in that city)

The dffile also has the KEY column - however it is not always found. (when I copy a random KEYfrom the df_del dataframe and paste it in the search of the df frame I get no results sometimes. (This is because I am only using a snap shot of the df data and all of the df_del data). The alternative way i.e. KEYnumber from the dfdataframe and pasted into the df_delgives me a result (can sometimes appear more than once in both data.frames)

My question;

When I try to run;

library(plyr)
df_test <- join(df, df_del,
     type = "left")

I obtain 500,000 results with all the df results but in the merged place name I get just NA values, I have tried, right, left, inneretc. I have also tried merge(df, df_del, by = "KEY") and get a result of zero.

Any help would be appreciated.

df should look like:

data.table(df)
        firm_id brand_id   KEY UNITS DOLLARS       DATE    place_name
     1:     511      263  647840     1    7.29 2001-01-01  NYC
     2:     511      265  647840     2   14.58 2001-01-01  NYC
     3:     511      265  532733     1    6.39 2001-01-01  Los Angeles
     4:      23      417  263939     1    4.79 2001-01-01  Chicago
     5:      23      417  648768     5   24.45 2001-01-01  Houston
Uwe
  • 41,420
  • 11
  • 90
  • 134
user113156
  • 6,761
  • 5
  • 35
  • 81

2 Answers2

0

you have a got a problem with your KEY column not matching. It could be because there is no common number like in my example

library(dplyr)
df_del <- data.table(KEY=c(1,2,3,4,5,6,7,8,9,10),place_name=c("NY","LONDON","PARIS","MELBOURNE","TOKYO","NY","LONDON","PARIS","MELBOURNE","TOKYO"))
df <- data.table(KEY=c(11,15,16,21,52)),UNITS=c(1,5,20,2,4))
merge(df,df_del,by="KEY")

Empty data.table (0 rows) of 3 cols: KEY,UNITS,place_name

left_join(df,df_del,by="KEY") 
KEY UNITS place_name  
1  11     1       <NA>
2  15     5       <NA>
3  16    20       <NA>
4  21     2       <NA>
5  52     4       <NA>
Untitpoi
  • 141
  • 1
  • 9
  • Thanks for the reply. Apologies I forgot to put matching ID numbers in both columns. (in the actual dataset there is matching KEYID rows). – user113156 Dec 20 '17 at 17:52
0

Your data:

   library(data.table)

df <- structure(list(is = c(1, 2, 3, 4, 5), firm_id = c(511, 511, 511, 
23, 23), brand_id = c(263, 265, 265, 417, 417), KEY = c(647840, 
647840, 532733, 263939, 648768), UNITS = c(1, 2, 1, 1, 5), DOLLARS = c(7.29, 
14.58, 6.39, 4.79, 24.45), DATE = c("2001-01-01", "2001-01-01", 
"2001-01-01", "2001-01-01", "2001-01-01")), .Names = c("is", 
"firm_id", "brand_id", "KEY", "UNITS", "DOLLARS", "DATE"), 
class = c("data.table", "data.frame"), row.names = c(NA, -5L))


df_del <- structure(list(KEY = c(200039, 200171, 200197, 200233, 200272, 647840, 532733, 263939, 648768
), place_Name = c("BUFFALO/ROCHESTER", "MILWAUKEE", "PEORIA/SPRINGFLD.", 
"OKLAHOMA CITY", "LOS ANGELES", "NYC", "Los Angeles", "Chicago", "Houston")), class = c("data.table", "data.frame"), .Names = c("KEY", 
"place_Name"), row.names = c(NA, -5L))

Beauty of data.table is its concise syntax for joins.

setkey(df, KEY)
setkey(df_del, KEY)

df_del[df]

Produces the table you expected to see

      KEY  place_Name is firm_id brand_id UNITS DOLLARS       DATE
1: 263939     Chicago  4      23      417     1    4.79 2001-01-01
2: 532733 Los Angeles  3     511      265     1    6.39 2001-01-01
3: 647840         NYC  1     511      263     1    7.29 2001-01-01
4: 647840         NYC  2     511      265     2   14.58 2001-01-01
5: 648768     Houston  5      23      417     5   24.45 2001-01-01
dmi3kno
  • 2,943
  • 17
  • 31