0

I have a dataset of 80 variables, and I want to loop though a subset of 50 of them and construct returns. I have a list of the names of the variables for which I want to construct returns, and am attempting to use the dplyr command mutate to construct the variables in a loop. Specifically my code is:

for (i in returnvars) {
   alldta <- mutate(alldta,paste("r",i,sep="") = (i - lag(i,1))/lag(i,1))}

where returnvars is my list, and alldta is my dataset. When I run this code outside the loop with just one of the `i' values, it works fine. The code for that looks like this:

alldta <- mutate(alldta,rVar = (Var- lag(Var,1))/lag(Var,1))

However, when I run it in the loop (e.g., attempting to do the previous line of code 50 times for 50 different variables), I get the following error:

Error: unexpected '=' in:
"for (i in returnvars) {
alldta <- mutate(alldta,paste("r",i,sep="") ="

I am unsure why this issue is coming up. I have looked into a number of ways to try and do this, and have attempted solutions that use lapply as well, without success.

Any help would be much appreciated! If there is an easy way to do this with one of the apply commands as well, that would be great. I did not provide a dataset because my question is not data specific, I'm simply trying to understand, as a relative R beginner, how to construct many transformed variables at once and add them to my data frame.

EDIT: As per Frank's comment, I updated the code to the following:

for (i in returnvars) {
   varname <- paste("r",i,sep="")
   alldta <- mutate(alldta,varname = (i - lag(i,1))/lag(i,1))}

This fixes the previous error, but I am still not referencing the variable correctly, so I get the error

Error in "Var" - lag("Var", 1) : 
non-numeric argument to binary operator 

Which I assume is because R sees my variable name Var as a string, rather than as a variable. How would I correctly reference the variable in my dataset alldta? I tried get(i) and alldta$get(i), both without success.

I'm also still open to (and actively curious about), more R-style ways to do this entire process, as opposed to using a loop.

Meru Bhanot
  • 55
  • 10
  • Welcome to SO. First of all you should read [here](http://stackoverflow.com/help/how-to-ask) about how to ask a good question; a good question has better changes to be solved and you to receive help. On the other hand a read of [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) is also good. It explains how to create a reproducible example in R. Help users to help you by providing a piece of your data a desired output and things you have tried so far. – SabDeM Sep 15 '15 at 20:08
  • 2
    Have a look at this question: http://stackoverflow.com/questions/26003574/r-dplyr-mutate-use-dynamic-variable-names You are getting the error because you cannot construct your own varname to the left of `=` (neither inside `mutate` nor anywhere else in R). – Frank Sep 15 '15 at 20:32
  • This is a useful comment @Frank, thanks! I'll update the above code - it still doesn't work, but now the problem is more one about variable references from loops rather than the error I was receiving before. – Meru Bhanot Sep 15 '15 at 20:44
  • 1
    You **can** construct your own varname if you use `mutate_()` and construct the entire call, something like `mutate_(alldta, sprintf("r%s = (%s - lag(%s,1))/lag(%s,1))", i, i, i, i))`. (I might have missed a paren--and maybe there is a prettier way than the `i, i, i, i`.) – Gregor Thomas Sep 15 '15 at 20:57
  • A kludgy work around could be to `select` your columns, `mutate` them via `mutate_each` and then bind them back to the original after naming: `alldta %>% select(one_of(returnvars)) %>% mutate_each(funs( (. - lag(.))/lag(.))) %>% setNames(paste0("r", names(.))) %>% bind_cols(alldta, .)` – aosmith Sep 15 '15 at 21:22
  • This code did not work for me - my R Studio did not know what one_of is. – Meru Bhanot Sep 16 '15 at 13:05
  • That's a function from dplyr, see the `select` help page in current dplyr versions for more info. – aosmith Sep 16 '15 at 16:20
  • Looks like `one_of` was first added in dplyr 0.3. – aosmith Sep 16 '15 at 16:29

2 Answers2

0

Using mutate inside a loop might not be a good idea either. I am not sure if mutate makes a copy of the data frame but its generally not a good practice to grow a data frame inside a loop. Instead create a separate data frame with the output and then name the columns based on your logic.

result = do.call(rbind,lapply(returnvars,function(i) {...})
names(result) = paste("r",returnvars,sep="")
Rohit Das
  • 1,962
  • 3
  • 14
  • 23
  • This looks like the kind of solution I want, but this isn't exactly working for me. Is it possible you could update your answer for my specific case, where the function is: (i - lag(i,1))/lag(i,1)) and return vars is a list of variable names within alldta (though without the alldta$ in front, so I'd need to sort out how to tell this to R) – Meru Bhanot Sep 16 '15 at 13:03
0

After playing around with this more, I discovered (thanks to Frank's suggestion), that the following works:

extended <- alldta # Make a copy of my dataset
for (i in returnvars) {
  varname <- paste("r",i,sep="")
  extended[[varname]] = (extended[[i]] - lag(extended[[i]],1))/lag(extended[[i]],1)}

This is still not very R-styled in that I am using a loop, but for a task that is only repeating about 50 times, this shouldn't be a large issue.

Meru Bhanot
  • 55
  • 10