From the nature of your problem it is bound to be that you're doing a many-by-many merge, where each student occurs many times in every dataframe. You might want to check how many times. If each student occurs twice in every data frame, that means one student will make 4 rows. if a student occurs 10 times, the merge will add 100 rows. First check how many rows you'll get. This is the function I use for that:
count.rows <- function(x,y,v,all=FALSE){
tx <- table(x[[v]])
ty <- table(y[[v]])
val <- val <- names(tx)[match(names(tx),names(ty),0L) > 0L]
cts <- rbind(tx[match(val,names(tx))],ty[match(val,names(ty))])
colnames(cts) <- val
sum(apply(cts,2,prod,na.rm=all),na.rm=TRUE)
}
count.rows(DF1,DF2,"STUDENT.NAME")
If you would do what you asked me (read up the R documentation), you'd see that the complexity is dependent on the length of the answer. This is not due to the merge algorithm itself, but the binding of all the results together. If you really want a less memory hungry solution, you need especially to get rid of that binding. Following algorithm does that for you. I wrote it out so you can find the logic, and it can be optimized. Mind you that it does not give the same result, it copies all columns of both dataframes. So you might want to adapt that a little.
mymerge <- function(x,y,v,count.only=FALSE){
ix <- match(v,names(x))
iy <- match(v,names(y))
xx <- x[,ix]
yy <- y[,iy]
ox <- order(xx)
oy <- order(yy)
xx <- xx[ox]
yy <- yy[oy]
nx <- length(xx)
ny <- length(yy)
val <- unique(xx)
val <- val[match(val,yy,0L) > 0L]
cts <- cbind(table(xx)[val],table(yy)[val])
dimr <- sum(apply(cts,1,prod),na.rm=TRUE)
idx <- vector("numeric",dimr)
idy <- vector("numeric",dimr)
ndx <- embed(c(which(!duplicated(xx)),nx+1),2)[unique(xx) %in% val,]
ndy <- embed(c(which(!duplicated(yy)),ny+1),2)[unique(yy) %in% val,]
count = 1
for(i in 1:nrow(ndx)){
nx <- abs(diff(ndx[i,]))
ny <- abs(diff(ndy[i,]))
ll <- nx*ny
idx[count:(count+ll-1)] <-
rep(ndx[i,2]:(ndx[i,1]-1),ny)
idy[count:(count+ll-1)] <-
rep(ndy[i,2]:(ndy[i,1]-1),each=nx)
count <- count+ll
}
x <- x[ox[idx],]
names(y) <- paste("y.",names(y),sep="")
x[names(y)] <- y[oy[idy],]
rownames(x) <- 1:nrow(x)
x
}
Some testing code so you can see it works :
DF1 <- data.frame(
ID = 1:10,
STUDENT.NAME=letters[1:10],
SCORE = 1:10
)
id <- c(3,11,4,6,6,12,1,4,7,10,5,3)
DF2 <- data.frame(
ID = id,
STUDENT.NAME=letters[id],
SCORE = 1:12
)
mymerge(DF1,DF2,"STUDENT.NAME")
Doing the same with two dataframes of 0.5 million rows and 4 columns with up to 10 matches per student name, it returns a dataframe with 5.8 million rows and 8 columns andd gives following picture on the memory :

The yellow box is the merge call, the green box is the mymerge call. Memory ranges from 2.3Gb to 3.74Gb, so the merge call uses 1.45 Gb and mymerge a bit over 0.8 Gb. Still no "out of memory" errors... The testing code for this is below :
Names <- sapply(
replicate(120000,sample(letters,4,TRUE),simplify=FALSE),
paste,collapse="")
DF1 <- data.frame(
ID10 = 1:500000,
STUDENT.NAME = sample(Names[1:50000],500000,TRUE),
FATHER.NAME = sample(letters,500000,TRUE),
SCORE1 = rnorm(500000),
stringsAsFactors=FALSE
)
id <- sample(500000,replace=TRUE)
DF2 <- data.frame(
ID20 = DF1$ID10,
STUDENT.NAME = DF1$STUDENT.NAME[id],
SCORE = rnorm(500000),
SCORE2= rnorm(500000),
stringsAsFactors=FALSE
)
id2 <- sample(500000,20000)
DF2$STUDENT.NAME[id2] <- sample(Names[100001:120000],20000,TRUE)
gc()
system.time(X <- merge(DF1,DF2,"STUDENT.NAME"))
Sys.sleep(1)
gc()
Sys.sleep(1)
rm(X)
gc()
Sys.sleep(3)
system.time(X <- mymerge(DF1,DF2,"STUDENT.NAME"))
Sys.sleep(1)
gc()
rm(X)
gc()