I had a problem in R where what I wanted to do was if a condition was met with a row in column A I wanted to match the value of that row in column C, and find the last instance that the value appeared in column B, and then add a number to that row for column D. I found a solution, but it's very slow when calculated on a data-frame with a few million rows even when I use a parallel version of my original code it takes ~30 mins to complete. What can I do to speedup this code, or is there a faster alternative function that accomplishes the same thing? Here is the parallel code I currently have:
x = which(df$a == 4)
y = df$c[which(df$a == 4)]
clusterExport(cl, "df")
clusterExport(cl, "x")
clusterExport(cl, "y")
z = parSapply(cl,seq_along(y), function(i) max(grep(y[i], df$b[1:x[i]])))
df$d[z[!is.infinite(z)]] = df$d[z[!is.infinite(z)]] + 3