Reworking a current process that has two data frames.
DF1 - 65kish rows, 15 columns DF2 - 300kish rows, 270 columns
We are merging by zip as such:
newdf <- merge(df1, df2, by.x = "ZipA", by.y = "ZipB")
This is slow and depending on what else is running at the moment on the EC2 instance, may terminate. Important note: Zips are NOT unique in each DF(this is by design) What other options would people recommend?
sqldf? data.table? sparklyr(We have a spark back-end setup but nobody uses it)?
Really at a loss here as to how to make this more efficient but I'm afraid we might just be stuck due to the constructs of the data.