1

I'm applying this function to a list in R.

tmp<-lapply(mydata,transform, V3 = ifelse(V2 > 20, V3, NA))

Each element in the list is a dataframe with 3 numeric columns V1, V2 and V3. The code above works just fine.

But if I try to set the columns as variables:

colA<-paste("V",2,sep="")
colB<-paste("V",3,sep="")

and then:

tmp<-lapply(mydata,transform, colB = ifelse(colA > 20, colB, NA))

this doesn't work. It creates a new column named "colB" fill with "V3" string.

I also tried with get:

tmp<-lapply(mydata,transform, get(colB) = ifelse(get(colA) > 20, get(colB), NA))
Error: unexpected '=' in "tmp<-lapply(mydata,transform, get(colB) ="

Is there any way to pass a variable with the column name in R? The final aim is that colA and colB are passed as command line arguments when calling the script with Rscript because the same code could be applied to different lists with variable number of columns. Thanks

PedroA
  • 1,803
  • 4
  • 27
  • 50

1 Answers1

2

This is similar to the discussion on subset (Why is `[` better than `subset`?) in the sense that transform is used interactively. Because your usage is more programmatic here (you are passing variable names via objects), you better move away from transform and start using [[ to access (get/set) the columns of your data:

lapply(mydata, function(x) {
   x[[colB]] <- ifelse(x[[colA]] > 20, x[[colB]], NA)
   return(x)
})

Or

lapply(mydata, function(x, c1, c2) {
   x[[c2]] <- ifelse(x[[c1]] > 20, x[[c2]], NA)
   return(x)
}, c1 = colA, c2 = colB)
Community
  • 1
  • 1
flodel
  • 87,577
  • 21
  • 185
  • 223