-3

I have a dataset in R which is structured as below

Headers:
ClientID Geo Industry RevBiz1_09 RevBiz1_10 RevBiz1_11 RevBiz1_12 RevBiz2_09 RevBiz2_10 RevBiz2_11 RevBiz2_12...

What I want to do is write a function in R that starts with column 4 and goes through each set of 4 columns, calculates CAGR and generates a new column with that value for the respective Biz. What I am having trouble doing is figuring out how to write the loop.

Any help would be greatly appreciated.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Scohen
  • 395
  • 1
  • 3
  • 5
  • Are you calculating CAGR as [(RevBiz1_12/RevBiz1_09) ^ (1/4)] - 1? – TheComeOnMan Oct 01 '13 at 17:10
  • 1
    Please help us help you by providing us with a reproducible example (i.e. code and example data), see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for details. – Paul Hiemstra Oct 01 '13 at 17:15
  • Less concerned with how to calculate cagr, more concerned with how to write the loop to move through the columns. For example what is the loop setup that I need to use to then put the function in. – Scohen Oct 01 '13 at 17:15
  • You aren't likely to get much help unless you provide an example (ideally, reproducible) of what you've tried, so that we can help you with something _specific_ rather than just write code for you. – joran Oct 01 '13 at 17:19

1 Answers1

1

I haven't run it but this should give you an idea of what to do. However, I'd still recommend that you post an example for other people who might benefit from your question later on.

Edit - assuming columns ending with "_12" are only for the quantity that CAGR needs to be calculated from.

   library(data.table)
   # Getting the list of column names for which CAGR needs to be calculated
   Instances = gsub(
      colnames(dataset)[
         grepl(colnames(dataset), pattern = "_12")
         ], 
      pattern = "_12", 
      replacement = ""
   )

   for ( i in Instances )
   {
      #calculating CAGR for each i
      #dataset is a data.table and not a data.frame
      dataset[, 
         paste0("CAGR",i):= (get(paste0(i,"_12")) / get(paste0(i,"_09")) ^ 1/4) - 1
      ]

   }
TheComeOnMan
  • 12,535
  • 8
  • 39
  • 54
  • Thanks for the help, is there a good way to do this without actually referencing the column names. Basically have the loop run calculation on first 4 columns, generate new variable, run calculation on next 4 columns generate new variable. My issue is that the RevBiz actually changes... They do all however end in 9 10 11 12 – Scohen Oct 01 '13 at 17:26
  • Edited. Now instances doesn't loop over 1 to n but instead loops over the underlying name under each group of 4. – TheComeOnMan Oct 01 '13 at 17:37
  • I am receiving the following error any ideas? Error: unexpected ']' in: " #dataset is a data.table and not a data.frame mdb[,paste0("CAGR",i):= ((get(paste0(i,".12")) / get(paste0(i,".09")) ^ 1/4) - 1]" > } Error: unexpected '}' in "}" – Scohen Oct 01 '13 at 18:08
  • I changed it a bit, it might work now. Like I said, I haven't run this. Please try and trace the brackets and run it on your data, since you are intent on not providing an example. – TheComeOnMan Oct 01 '13 at 18:15
  • And please accept the answer, if it does solve your problem. Like others have mentioned, the idea is not to write code for you but to help you out in a way that someone else might find useful too. – TheComeOnMan Oct 01 '13 at 18:47
  • 1
    This was very helpful. Although I am curious as to why you use data table instead of data frame. – Scohen Oct 02 '13 at 01:10
  • 1
    It's quicker and lighter - http://cran.r-project.org/web/packages/data.table/index.html – TheComeOnMan Oct 02 '13 at 05:19