0

I'm trying to create a loop and for each iteration (the number of which can vary between source files) construct a mutate statement to add a column based on the value of another column.

Having my programming background in php, to my mind this should work:

for(i in number){
         colname <- paste("Column",i,sep="")
         filtercol <- paste("DateDiff_",i,sep="")
         dataset <- mutate(dataset, a = ifelse(b >= 0 & b <= 364,1,NA))
     }

But... as I've noticed a couple of times now with R functions sometimes the function ignores outright that you have defined a variable with that name - as mutate() is here.

So instead of getting several columns titled "a1", "a2", "a3", etc, I get one column entitled "a" that gets overwritten each iteration.

Firstly, can somebody point out to me where I'm going wrong here, but secondly could someone explain to me under what circumstances R ignores variable names, as it's happened a couple of times now and it just seems wildly inconsistent at this point. I'm sure it's not, and there's logic there, but it's certainly well obfuscated.

It's also worth mentioning that originally I tried it this way:

just.dates <- just.dates %>%
     for(i in number){
         a <- paste("a",i,sep="")
         filtercol <- paste("DateDiff_",i,sep="")
         mutate(a = ifelse(filtercol >= 0 & filtercol <= 364),1,NA)
     }

But that way decided I was passing the for() loop 4 arguments when it only wanted three.

alistaire
  • 42,459
  • 4
  • 77
  • 117
Nick
  • 799
  • 1
  • 7
  • 18
  • Maybe this will help: https://stackoverflow.com/questions/26003574/r-dplyr-mutate-use-dynamic-variable-names/26003971#26003971. The idea is that strings are different from variables. And that some function in R use non-standard evaluation where variables are treated like symbol names and not evaluated as usual. Additionally when calling functions with named parameters, variables are never evaluated on the left of an equal sign (the name of the parameter). – MrFlick May 25 '17 at 15:42
  • **a.** What's the point of `colname` and `filtercol` in the top version? **b.** `mutate` will overwrite a column if the name already exists. **c.** There is almost always a better way to write code in R than `for` loops. Here I may make a data.frame with `purrr::map_df` and use `bind_cols`, but there are lots of options. **d.** If you _really_ want to pass string variables as arguments in dplyr, you'll need to use standard evaluation. With 0.5 that meant `mutate_` and lazyeval; with the forthcoming 0.6 it means [rlang](https://github.com/tidyverse/dplyr/blob/master/vignettes/programming.Rmd). – alistaire May 25 '17 at 16:06
  • 2
    and **e.** [You should make your example reproducible.](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#5963610) – alistaire May 25 '17 at 16:06
  • 1
    It's straight foward to do this with base R. You don't really need a package for this. Using `DF` as your data frame: `DF[[a]] <- ifelse(DF[[filtercol]] >= 0 & DF[[filtercol]] <= 364, 1, NA)` – G. Grothendieck May 25 '17 at 16:23

1 Answers1

0

Something like this may work for you. The mutate_() function as opposed to just mutate() should help you out with this.

# Create dataframe for testing
dataset <- data.frame(date = as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001",
                                       "06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011"), "%d/%m/%Y"),
                      event=c(0,0,1,0,1, 1,0,1,0,1),
                      id = c(rep(1,5),rep(2,5)),
                      DateDiff_1 = c(-2,0,34,700,rep(5,6)), 
                      DateDiff_2 = c(20,-12,360,900,rep(5,6))
                     )

# Set test number vector
number <- c(1:2)

# Begin loop through numbers
for(i in number){
  # Set the name of the new column to be created
  newcolumn <- paste("Column",i,sep="")

  # Set the name of the column to be filtered
  filtercolumn <- paste("DateDiff_",i,sep="")

  # Create the function to be passed into the mutate command
  mutate_function = lazyeval::interp(~ ifelse(fc >= 0 & fc <= 364, 1, NA), fc = as.name(filtercolumn))

  # Apply the mutate command to the dataframe
  dataset  <- dataset  %>% 
              mutate_(.dots = setNames(list(mutate_function), newcolumn)) 
}
Matt Jewett
  • 3,249
  • 1
  • 14
  • 21