-1

I am working with rather large data sets, in which I am using a for loop to call functions where the functions have 2 nested for loops. It is very slow and I need to see if there is a way of using vectors rather than looping over observations and variables. My first data set contains 10,000+ records of the sale and transfer of goats within New Zealand. My 'main' data set contains quarterly data on every type of goat in every farm. Namely, there are 300 types of goats and ~15,000 farms. This 'main' data frame is saved as a panel data frame. Given a farm takeover - I need the goat data for the purchased farm to be added to the purchasing farms data. Here is the structure of the program I was previously using:

goat_full_takeover<-function(farm1,farm2,takeover_date){
for(loop over dates(i)){
for(loop over goat types(j)){
   farm[farm(1)date(i)goat(j)] = farm[farm(1)date(i)goat(j)] + 
                                 farm[farm(2)date(i)goat(j)]
   farm[farm(2)date(i)goat(j)] = NA
}
}
}

goat_partial_takeover<-function(farm1,farm2,takeover_date){
for(loop over dates(i) up to takeover_date){
for(loop over goat types(j)){
   goat_scale = #goats[takeover_date + 1] / #goats[takeover_date] 

   farm[farm(1)date(i)goat(j)] = farm[farm(1)date(i)goat(j)] + goat_scale *
                                  (farm[farm(2)date(i)goat(j)])
   farm[farm(2)date(i)goat(j)] = (1-goat_scale)*farm[farm(2)date(i)goat(j)]
}
}
}

for(every goat farm purchased){
if(takeover == type1){
  goat_full_takeover
}
if(takeover == type2){
  goat_partial_takeover
}
}

So I am looping over 10,000+ goat sales, where I then loop over up to 100 quarterly dates for 300 types of goats. Is there a way I can format this differently so that I only need to loop over the purchases? Ideally, I can remove the nested for loop from the function so that the function will just stack the values for farm2 on farm1

       goat1   goat2   goat3
farm1 Q1     5       5       5 
farm1 Q2     6       6       6
farm1 Q3     7       7       7 
farm1 Q4     8       8       8 
farm2 Q1     9       9       9 
farm2 Q2    10      10       10
farm2 Q3    11      11       11  
farm2 Q4    12      12       12

Farm 1 buys Farm 2 >>

      goat1   goat2   goat3
farm1 Q1     14      14      14 
farm1 Q2     16      16      16
farm1 Q3     18      18      18 
farm1 Q4     20      20      20 
farm2 Q1     na      na      na
farm2 Q2     na      na      na
farm2 Q3     na      na      na  
farm2 Q4     na      na      na

My issue is purely with coming up with a faster way to do this. Currently, due to the number of goats this takes almost one week to run. I need to find a streamlined way to go about this so that I can use this for other countries besides New Zealand. Thank you in advance for any help, it is greatly appreciated. If anything was not made clear, please let me know!

Brian
  • 17
  • 5
  • 2
    You'll probably want to work on giving a reproducible example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 It's hard to evaluate performance without something concrete for folks to test. Right now, it looks like your example is all pseudocode and it's not obvious how to scale it up to large n. – Frank Mar 14 '18 at 18:36
  • `aggregate(.~farm+quarter,transform(dat,farm=sub("farm2","farm1",farm)),sum)` where `farm` is the first column and `quarter` is the second column – Onyambu Mar 14 '18 at 18:43
  • Interesting question and I have some ideas but without compilable code and sample data, we can only *sigh* and move on. – Parfait Mar 14 '18 at 21:15

1 Answers1

0
dat=read.table(text=" farm quarter     goat1   goat2   goat3
farm1 Q1     5       5       5 
               farm1 Q2     6       6       6
               farm1 Q3     7       7       7 
               farm1 Q4     8       8       8 
               farm2 Q1     9       9       9 
               farm2 Q2    10      10       10
               farm2 Q3    11      11       11  
               farm2 Q4    12      12       12
               farm3 Q1    13     13       13
               farm3 Q2    13     13       13
               farm3 Q3    13     13       13
               farm3 Q4    13     13       13

               ",stringsAsFactors=F,h=T)


takeover=function(dat,buyer,seller){
   pattern=paste(seller,collapse="|")
   A=aggregate(.~farm+quarter,dat[grepl(pattern,dat$farm),],`is.na<-`)
   x=rbind(transform(dat,farm=sub(pattern,buyer,farm)),A)
   aggregate(.~quarter+farm,x,sum,na.action = function(x)x)
 }
takeover(dat,"farm2","farm1")#farm2 takes over farm1
   quarter  farm goat1 goat2 goat3
1       Q1 farm1    NA    NA    NA
2       Q2 farm1    NA    NA    NA
3       Q3 farm1    NA    NA    NA
4       Q4 farm1    NA    NA    NA
5       Q1 farm2    14    14    14
6       Q2 farm2    16    16    16
7       Q3 farm2    18    18    18
8       Q4 farm2    20    20    20
9       Q1 farm3    13    13    13
10      Q2 farm3    13    13    13
11      Q3 farm3    13    13    13
12      Q4 farm3    13    13    13

takeover(dat,"farm1","farm3")#farm1 takes over farm3
   quarter  farm goat1 goat2 goat3
1       Q1 farm1    18    18    18
2       Q2 farm1    19    19    19
3       Q3 farm1    20    20    20
4       Q4 farm1    21    21    21
5       Q1 farm2     9     9     9
6       Q2 farm2    10    10    10
7       Q3 farm2    11    11    11
8       Q4 farm2    12    12    12
9       Q1 farm3    NA    NA    NA
10      Q2 farm3    NA    NA    NA
11      Q3 farm3    NA    NA    NA
12      Q4 farm3    NA    NA    NA

takeover(dat,"farm3",c("farm2","farm1"))#farm3 takes over farm1 and farm2. The order doesnt matter
   quarter  farm goat1 goat2 goat3
1       Q1 farm1    NA    NA    NA
2       Q2 farm1    NA    NA    NA
3       Q3 farm1    NA    NA    NA
4       Q4 farm1    NA    NA    NA
5       Q1 farm2    NA    NA    NA
6       Q2 farm2    NA    NA    NA
7       Q3 farm2    NA    NA    NA
8       Q4 farm2    NA    NA    NA
9       Q1 farm3    27    27    27
10      Q2 farm3    29    29    29
11      Q3 farm3    31    31    31
12      Q4 farm3    33    33    33
Onyambu
  • 67,392
  • 3
  • 24
  • 53