0

The following code runs a loops but the problem is the speed; it takes several hours to finish and I am looking for an alternative so that I don´t have to wait so long.

Basically what the code does the follolling calculations:

1.-It calculates the mean of the values of the 60 days.
2.-It gets the standard deviation of the values of the 60 days.
3.-It gets the Max of the values of the 60 days.
4.-It gets the Min of the values of the 60 days.
5.-Then with the previous calculations the code "smooths" the peaks up and down.
6.-Then the code simply get the means from 60, 30, 15 and 7 Days.

So the purpose of these code is to remove the peaks of the data using the method already mentioned.

Here is the code:

options(stringsAsFactors=F)

DAT <- data.frame(ITEM = "x", CLIENT = as.numeric(1:100000), matrix(sample(1:1000, 60, replace=T), ncol=60, nrow=100000, dimnames=list(NULL,paste0('DAY_',1:60))))

DATT <- DAT

nRow <- nrow(DAT)
TMP  <- NULL
for(iROW in 1:nRow){#iROW <- 1
 print(c(iROW,nRow))

  Demand <- NULL
  for(iCOL in 3:ncol(DAT)){#iCOL <- 1
      Demand  <- c(Demand,DAT[iROW,iCOL])

  }

  ww <- which(!is.na(Demand))

  if(length(ww) > 0){
    Average <- round(mean(Demand[ww]),digits=4)
    DesvEst  <- round(sd(Demand,na.rm=T),digits=4)
    Max      <- round(Average + (1 * DesvEst),digits=4)
    Min      <- round(max(Average - (1 * DesvEst), 0),digits=4)
    Demand  <- round(ifelse(is.na(Demand), Demand, ifelse(Demand > Max, Max, ifelse(Demand < Min, Min, Demand))))
    Prom60   <- round(mean(Demand[ww]),digits=4)
    Prom30   <- round(mean(Demand[intersect(ww,(length(Demand) - 29):length(Demand))]),digits=4)
    Prom15   <- round(mean(Demand[intersect(ww,(length(Demand) - 14):length(Demand))]),digits=4)
    Prom07   <- round(mean(Demand[intersect(ww,(length(Demand) - 6):length(Demand))]),digits=4)

  }else{
    Average <- DesvEst <- Max <- Min <- Prom60 <- Prom30 <- Prom15 <- Prom07 <- NA

  }

  DAT[iROW,3:ncol(DAT)] <- Demand
  TMP <- rbind(TMP, cbind(DAT[iROW,], Average, DesvEst, Max, Min, Prom60, Prom30, Prom15, Prom07))
}
DAT <- TMP
Max Molina
  • 73
  • 6
  • 1
    Take a look at the `tidyverse` packages. `dplyr` is an efficient and simple package for `data.frame` operations. Also the `data.table` package (which you tagged probably by mistake) implements `data.table` class, with more efficient methods than base r's. – VFreguglia Jun 22 '18 at 14:40
  • could you provide the dput of your data set? First few rows will be good to have. – YOLO Jun 22 '18 at 14:41
  • 1
    I think you could get rid of the inner for loop with something like this `Demand <- as.numeric(DAT[iROW,3:ncol(DAT)])` – Random Cotija Jun 22 '18 at 15:14

1 Answers1

4

If one runs your code (with smaller number of rows) through a profiler, one sees that the main issue is the rbind in the end, followed by the c mentioned by @Riverarodrigoa:

We can focus on these two by creating numeric matrices of suitable size and working with those. Only in the end the final data.frame is created:

options(stringsAsFactors=F)
N <- 1000
set.seed(42)
DAT <- data.frame(ITEM = "x", 
                  CLIENT = as.numeric(1:N), 
                  matrix(sample(1:1000, 60, replace=T), ncol=60, nrow=N, dimnames=list(NULL,paste0('DAY_',1:60))))

nRow <- nrow(DAT)
TMP  <- matrix(0, ncol = 8, nrow = N,  
               dimnames = list(NULL, c("Average", "DesvEst", "Max", "Min", "Prom60", "Prom30", "Prom15", "Prom07")))
DemandMat <- as.matrix(DAT[,3:ncol(DAT)])

for(iROW in 1:nRow){
  Demand <- DemandMat[iROW, ]

  ww <- which(!is.na(Demand))

  if(length(ww) > 0){
    Average <- round(mean(Demand[ww]),digits=4)
    DesvEst  <- round(sd(Demand,na.rm=T),digits=4)
    Max      <- round(Average + (1 * DesvEst),digits=4)
    Min      <- round(max(Average - (1 * DesvEst), 0),digits=4)
    Demand  <- round(ifelse(is.na(Demand), Demand, ifelse(Demand > Max, Max, ifelse(Demand < Min, Min, Demand))))
    Prom60   <- round(mean(Demand[ww]),digits=4)
    Prom30   <- round(mean(Demand[intersect(ww,(length(Demand) - 29):length(Demand))]),digits=4)
    Prom15   <- round(mean(Demand[intersect(ww,(length(Demand) - 14):length(Demand))]),digits=4)
    Prom07   <- round(mean(Demand[intersect(ww,(length(Demand) - 6):length(Demand))]),digits=4)

  }else{
    Average <- DesvEst <- Max <- Min <- Prom60 <- Prom30 <- Prom15 <- Prom07 <- NA

  }
  DemandMat[iROW, ] <- Demand 
  TMP[iROW, ] <- c(Average, DesvEst, Max, Min, Prom60, Prom30, Prom15, Prom07)
}
DAT <- cbind(DAT[,1:2], DemandMat, TMP)

For 1000 rows this takes about 0.2 s instead of over 4 s. For 10.000 rows I get 2 s instead of 120 s.

Obviously, this is not really pretty code. One could do this much nicer using tidyverse or data.table. I just find it worth noting that for loops are not necessarily slow in R. But dynamically growing data structures is.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75