I have the below code:
options(java.parameters = "-Xmx4000m")
require(xlsx)
library(plyr)
setwd("~/PycharmProjects/CatScrape")
rm(list=ls(all=TRUE))
jgc <- function() .jcall("java/lang/System", method = "gc")
Master <- read.xlsx2("MASTER.xlsx", sheetIndex = 2, startRow = 1, colIndex=4,endRow = 10000, as.data.frame = TRUE, header=TRUE)
Dutch_Stage <- read.xlsx2("languages/Dutch.xlsx", sheetIndex = 1, startRow = 1, colIndex=c(5,8),endRow = 10000, header=TRUE)
Dutch <- unique(Dutch_Stage)
rm(Dutch_Stage)
Dutch <- rename(Dutch, c("Key.s."="Key", "Status"="Dutch"))
jgc()
output <- merge(Master, Dutch, by="Key", all.Master = TRUE)
## OUTPUT RECORD NUMBER MATCHES MASTER
Finnish_Stage <- read.xlsx2("languages/Finnish.xlsx", sheetIndex = 1, startRow = 1, colIndex=c(5,8),endRow = 10000, header=TRUE)
Finnish <- unique(Finnish_Stage)
rm(Finnish_Stage)
Finnish <- rename(Finnish, c("Key.s."="Key", "Status"="Finnish"))
jgc()
output <- merge(output, Finnish, by="Key", all.output = TRUE)
## OUTPUT RECORD NUMBER INCREASES by 6
I have 12 more files to add, and when that happens, I end up with 25 times the number of records.
In this case, all.output = TRUE
is set to all of the files, and my goal is to just show the records from Master, and the associations to those records. I don't want the additional records.
This makes me think this is not a true "left join". How do I make it just a "LEFT JOIN"?
Thanks