-1

I have the following piece of code:

library(dplyr)

Q = 10000
span = 1995:2016
time = rep(span,times = Q, each= Q)
id = rep(1:Q,times=length(span))
s1 =  rep(rnorm(Q,0,1),times=length(span))
gdp = rep(rnorm(Q,0,1),times=length(span))
e = rep(rnorm(Q,0,1),times=length(span))
dfA = data.frame(id,time,s1,e,gdp)

mgr = double()
stp = 10
for(K in seq(10,Q,stp)){
  gr = double()
  for(t in span){
    wt1 = dfA %>% filter(time == t-1) %>%
      arrange(desc(s1)) %>% mutate(w= s1/gdp)
    zt1 = dfA %>% filter(time == t-1) %>% mutate(z1 = log(s1/e))
    zt = dfA %>% filter(time == t) %>% mutate(z = log(s1/e))
    gt = left_join(zt1,zt,by="name") %>%
      mutate(g = z-z1) %>% select(name,g) %>% na.omit()

    a = left_join(wt1,gt,by="name") %>% na.omit()
    a = a  %>% mutate(id = 1:length(a$name)) %>%
     filter(id <= Q) %>% mutate(gbar = mean(g)) %>%
     filter(id <= K) %>% mutate(sck = g-gbar, 
     gamma = w*sck)

     gr = append(gr, sum(a$gamma))
     }
mgr = append(mgr,mean(gr))
}

where dfA is a data frame containing an id variable and a time variable, among others. Since the time variable ranges from 1995 to 2016 and K is a sequence with step 10, I resorted to append() to store gr and mgr, respectively. The problem is that it takes too long to compute.

So my question is: Is there any way to avoid using append() to fill the vectors gr and mgr and thus reduce the time spent to compute the code?

Omar
  • 31
  • 7
  • Sample data would be *awesome*. Please see: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Oct 29 '18 at 05:59

1 Answers1

0

You could initiate the 'gr' and 'mgr' vectors with a set length rather than just initiate them as a double and have R extend them every iteration. The advantage is that the memory for the vector is allocated beforehand and you don't have to redefine the entire variable mgr/gr.

## initiate vectors with set length
mgr <- double(length = length(seq(10,Q,stp)))
gr <- double(length = length(1995:2016))

# fill the positions in each iteration
outerIteration <- (K - 10) / stp
innerIteration <- t - 1994
gr[innerIteration] <- sum(a$gamma)
# take the mean for each block of length 21 (2016 - 1995)
mgr[outerIteration] <- mean(gr[(outerIteraion -1)*21 + 1 : outerIteration*21])
FloSchmo
  • 723
  • 5
  • 9
  • Thanks for your reply. This is what I was looking for. However, it does not seem to shorten the elapsed time significantly. – Omar Oct 29 '18 at 10:52
  • nested for-loops have a long runtime. You could try to just execute the inner loop and then try to group your df and exectue the last filter ("<= K") and the last mutate for the specific chunks (each K corresponds to a 21 rows in a ). You could use an *apply* function which does the last part of your calculation for every chunk instead of iterating through the chunks. – FloSchmo Oct 29 '18 at 11:17