0

I have a script of a large data set (big j at loop for). I have already deleted some cycles but I think it can be optimized again. I tried the foreach() function but it gives me some errors. I do not know how to parallelize and if it is necessary. the script takes the dataframe par [] as input and for each record (which contains some parameters for the calculations) it must create a new record: parpa[]. It must first make calculations and for this the DF2 data frame is built inside the cycle. Finally the values ​​contained in perpa[] are calculated through a while loop. First question: Does anyone have an idea to optimize yet? I used mapply() but it returns errors.

par <- data.frame(CA=runif(n = 50, min = 70000, max = 100000),
      D=round(runif(n = 50, min = 70, max = 90),0),
      P=runif(n = 50, min = 900, max = 20000),
      A=round(runif(n = 50, min = 50, max = 70),0))

parpa <- data.frame(matrix(nrow = nrow(par), ncol = 3*V))

comp <- function(CA, D, P, A){

vect <- rep('numeric', 3*V)
b <- 1
k <- 1 
while (((b+1) <= (D+1))&(k < V)) { 
a <- b+1
b <- min((a+8-1), (D+1))
vect[c(1+4*k, 2+4*k, 3+4*k, 4+4*k)] <- c(mean(DF2$Z[a:b]), sum(DF2$X[a:b]),
                                mean(DF2$Q[a:b]), sum(DF2$AE[a:b]))
k <- k+1
}
return(vect)                       
}

#loop
for (j in 1:nrow(par)) {

CA <- par$CA[j] 
D <- par$D[j] 
R <- 0.01*D 
P <- par$P[j] 
A <- par$A[j]
COST <- 500    
V <- 5
#DF2
DF2 <- data.frame(M=0:D)
O <- function(x) {
c <- COST*D*DF2$M/R
return(c)
}
DF2$O <- O(D)
DF2$E <- (D*DF2$M+2)/D*(D+4)
DF2$Q <- (CA-DF2$M)*D
DF2$X <- (CA-DF2$O)*(DF2$E+P)
Func <- function(x) {return(round(x/30, 2))}
DF2$Z[(A+2):(D+1)] <- unlist(sapply(DF2$E[(A+2):(D+1)], Func))

parpa[j,] <- comp(CA, D, P, A)
}

-----------------------------with mapply()-----------------------------------

#loop
outputpa <- function(CA, D, P, A) {

CA <- par$CA 
D <- par$D 
R <- 0.01*D 
P <- par$P 
A <- par$A
COST <- 500    
V <- 5
#DF2
DF2 <- data.frame(M=0:D)
O <- function(x) {
c <- COST*D*DF2$M/R
return(c)
}
DF2$O <- O(D)
DF2$E <- (D*DF2$M+2)/D*(D+4)
DF2$Q <- (CA-DF2$M)*D
DF2$X <- (CA-DF2$O)*(DF2$E+P)
Func <- function(x) {return(round(x/30, 2))}
DF2$Z[(A+2):(D+1)] <- unlist(sapply(DF2$E[(A+2):(D+1)], Func))
}

parpa <- mapply(outputpa, par$CA, par$D, par$P, par$A)
stefanodv
  • 463
  • 3
  • 11
  • 20
  • Your first code block throws errors on account of `V` and `DF2` not being defined. It would help a lot if you could *explain* what you're trying to do. I like code optimisation questions, but at the moment you're just showing a code dump without any context/details. Also for code optimisation questions you really need to include your expected output; and choose a fixed seed when generating random data. – Maurits Evers Jul 09 '18 at 03:07
  • Hello, the script is a simplified part of another (more complex) script. Suppose that V is always defined while DF2 is built within the loop (j always> 1). For each record of the dataframe par [] the script calculates a new record (through the while loop the function that calculates it is defined) in parpa []. DF2 contains data that are used to calculate the values ​​contained in perpa (while loop ...). And DF2 is constructed for each record (j) of par, starting from some parameters contained in the par [j,] record (they are CA, D, P, ....). It's clearer? – stefanodv Jul 09 '18 at 06:24
  • The function that is called to calculate the values ​​in perpa [] is comp () and is defined at the beginning, after the dataframes. In my script the dataframe par [] is imported and generally contains many records. I do not know if it is important but par contains biological information on some plant species and perp [] of the values ​​obtained. The expected output is the dateframe perpa with the same number lines of par and at most V columns (V> 1). – stefanodv Jul 09 '18 at 06:35
  • I'm sorry but this is not helpful nor making things clearer. Please [edit](https://stackoverflow.com/posts/51230261/edit) your post, and include (1) a reproducible code example and (2) your expected output. "Reproducible" means that we should be able to copy & paste your code into an R terminal to reproduce your expected output (I already mentioned using a fixed seed). At the moment that is not the case. Please review how to provide a [minimal reproducible example/attempt](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). [...] – Maurits Evers Jul 09 '18 at 07:29
  • [continued] I understand that this is part of a larger script. The challenge is to reduce the larger problem into the smallest representative problem, clearly state what you are trying to do (how do you get from input to output), and then present this is a self-contained way. Include all details in the main post, as comments are transient. – Maurits Evers Jul 09 '18 at 07:32

0 Answers0