I am trying to parallelize a nested loop in which I substitute, for the common variables (changevars) between two datasets, within every country (v5) in it, every observation using its id (v3). I have to use the country+id since the id's are duplicated between countries.
My loop code is:
for (var in changevars) {
print(var)
for (i in unique(int2006$v5)) {
print(i)
for (id in unique(int2006$v3)) {
x2006r[x2006r$v5 == i & x2006r$v3 == id, var] <- int2006[int2006$v5 == i & int2006$v3 == id, var]
}
}
}
I want to parallelize it.
Although it works, it is really slow. And I do not get the logic behind the changing from a for
to a foreach
loop with dopar
. I've tried to understand the other answers, but my attempts have been all failures.
Reproducible example of datasets:
- Source Dataset
> dput(int2006)
structure(list(v3 = c(10001, 10002, 10003, 10004, 10005, 10006,
10007, 10008, 10009, 10010, 10011, 10012, 10013, 10014, 10015,
10016, 10017, 10018, 10019, 10020), v5 = c(36, 36, 36, 36, 36,
36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36),
v7 = c(3606, 3606, 3606, 3606, 3606, 3606, 3606, 3606, 3606,
3606, 3606, 3606, 3606, 3606, 3606, 3606, 3606, 3606, 3606,
3606), v8 = c(1, 1, 2, 1, NA, NA, 1, 2, 2, 2, NA, 2, 2, 1,
1, 1, 2, 2, 1, 2), v9 = c(NA, 2, 1, 2, 1, 1, 1, 2, 4, 1,
NA, 1, NA, 1, 1, 1, 1, 1, 1, 2)), row.names = c(NA, 20L), class = "data.frame")
- Target Dataset (the one to which the cells of 1 should be copied):
> dput(x2006r)
structure(list(v3 = c(10001, 10002, 10003, 10004, 10005, 10006,
10007, 10008, 10009, 10010, 10011, 10012, 10013, 10014, 10015,
10016, 10017, 10018, 10019, 10020), v5 = c(36, 36, 36, 36, 36,
36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36),
v7 = c("3606", "3606", "3606", "3606", "3606", "3606", "3606",
"3606", "3606", "3606", "3606", "3606", "3606", "3606", "3606",
"3606", "3606", "3606", "3606", "3606"), v8 = c(1, 1, 2,
1, NA, NA, 1, 2, 2, 2, NA, 2, 2, 1, 1, 1, 2, 2, 1, 2), v9 = c(NA,
2, 1, 2, 1, 1, 1, 2, 4, 1, NA, 1, NA, 1, 1, 1, 1, 1, 1, 2
)), row.names = c(NA, 20L), class = "data.frame")
- Variables to iterate
changevars <- c("v7","v8","v9")
Can someone help me? I'm really stuck. Also, I am not sure if parallelizing this loop will help me in terms of speed.
Thank you very much!