Joining two data.frames
data.table(df_del)
KEY place_Name
1: 200039 BUFFALO/ROCHESTER
2: 200171 MILWAUKEE
3: 200197 PEORIA/SPRINGFLD.
4: 200233 OKLAHOMA CITY
5: 200272 LOS ANGELES
data.table(df)
firm_id brand_id KEY UNITS DOLLARS DATE
1: 511 263 647840 1 7.29 2001-01-01
2: 511 265 647840 2 14.58 2001-01-01
3: 511 265 532733 1 6.39 2001-01-01
4: 23 417 263939 1 4.79 2001-01-01
5: 23 417 648768 5 24.45 2001-01-01
I am trying to join them by KEY but running into problems. The df
file has approx 500,000 results and the df_del
file has about 12,000.
The df_del
file has the unique product key and it can be bought in the same city, so there may be 10 KEY
values for one city(i.e. a product delivered 10 times in that city)
The df
file also has the KEY
column - however it is not always found. (when I copy a random KEY
from the df_del
dataframe and paste it in the search of the df
frame I get no results sometimes. (This is because I am only using a snap shot of the df
data and all of the df_del
data). The alternative way i.e. KEY
number from the df
dataframe and pasted into the df_del
gives me a result (can sometimes appear more than once in both data.frames)
My question;
When I try to run;
library(plyr)
df_test <- join(df, df_del,
type = "left")
I obtain 500,000 results with all the df
results but in the merged place name
I get just NA values, I have tried, right
, left
, inner
etc. I have also tried merge(df, df_del, by = "KEY")
and get a result of zero.
Any help would be appreciated.
df
should look like:
data.table(df)
firm_id brand_id KEY UNITS DOLLARS DATE place_name
1: 511 263 647840 1 7.29 2001-01-01 NYC
2: 511 265 647840 2 14.58 2001-01-01 NYC
3: 511 265 532733 1 6.39 2001-01-01 Los Angeles
4: 23 417 263939 1 4.79 2001-01-01 Chicago
5: 23 417 648768 5 24.45 2001-01-01 Houston