0

I have a dataframe whose column names will change everytime it is generated, so I'd like to pass the column name as a variable. Let's say this is a simplified version of my dataframe:

mydf<- data.frame(colors=c('Blue','Red','Green'), weight1=c(1:6),weight2=c(10:15))

If the column name were not an issue, the following code does what I want:

x<-ddply(mydf,'colors', summarize, sum(weight1))


  colors sum(weight1)
1   Blue            5
2  Green            9
3    Red            7

But if try to pass the column weight1as a variable, it no longer sums it by group, but returns a bulk sum instead. Here are a couple of things I've tried:

ddply(mydf,'colors', summarize, sum(mydf[2]))
  colors sum(mydf[2])
1   Blue           21
2  Green           21
3    Red           21


mycol <- colnames(mydf)[2]
ddply(Cars,'model', summarize, sum(get(mycol)))
Error: object 'weight1' not found

ddply(mydf,'colors', summarize, sum(eval(parse(text = mycol))))
Error: object 'weight1' not found

ddply(mydf,'colors', summarize, do.call('sum', mydf[2]))
colors do.call("sum", mydf[2])
1   Blue                      21
2  Green                      21
3    Red                      21

Any suggestions?

gogoy
  • 5
  • 3
  • With `data.table` it would be something like `library(data.table); setDT(mydf)[, sum(eval(as.name(mycol))), colors]` – David Arenburg Jan 04 '15 at 09:14
  • @DavidArenburg A doubt, Is it the standard way in `data.table`? `get` also seems to be work – akrun Jan 04 '15 at 10:20
  • @akrun, `get` will work, but I think will be very inefficient for a big data set – David Arenburg Jan 04 '15 at 10:21
  • @akrun, I've asked a similar question recently and there are some benchmarks there too. See [here](http://stackoverflow.com/questions/27677283/evaluating-both-column-name-and-the-target-value-within-j-expression-within-d) – David Arenburg Jan 04 '15 at 10:59

1 Answers1

0

You could try dplyr

library(dplyr)
library(lazyeval)
mydf %>% 
    group_by(colors) %>% 
   summarise_(sum_val=interp(~sum(var), var=as.name(mycol)))
#   colors sum_val
#1   Blue       5
#2  Green       9
#3    Red       7

Or using ddply from plyr

library(plyr)
ddply(mydf, .(colors), summarize,
   sum_val=eval(substitute(sum(var), list(var=as.name(mycol)))) )
#   colors sum_val
#1   Blue       5
#2  Green       9
#3    Red       7

Regarding the error in one of the codes,

ddply(Cars,'model', summarize, sum(get(mycol)))
#Error: object 'weight1' not found

the Cars object is not defined, but the below works for the example data.

ddply(mydf,'colors', summarize, sum_val=sum(get(mycol)))
#  colors sum_val
#1   Blue       5
#2  Green       9
#3    Red       7
gogoy
  • 5
  • 3
akrun
  • 874,273
  • 37
  • 540
  • 662