specify dplyr column names

Question

How can I pass column names to dplyr if I do not know the column name, but want to specify it through a variable?

e.g. this works:

require(dplyr)
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(group) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

But this does not

require(dplyr)
someColumn = "group"
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(someColumn) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

yes possibly. I ended up renaming the group column before the dplyr chain. something like `colnames(df)[which(colnames(df)==someColumn)] <- "group"` — user3241888, Jan 27 '14 at 23:50
It is worth noting that the 'correct' answer probably differs from the solutions below under dplyr 0.7.0. — russellpierce, Aug 01 '17 at 17:03

score 19 · Answer 1 · edited May 23 '17 at 11:55

I just gave a similar answer over at Group by multiple columns in dplyr, using string vector input, but for good measure: functions that allow you to operate on columns using strings have been added to dplyr. These have the same name as the regular dplyr functions, but end in an underscore. The functions are described in detail in this vignette.

Given df and someColumn from the OP, this now works a treat:

gdf <- df %>% group_by_(someColumn) %>% summarise(m1=mean(V1),m2=mean(V2),m3=mean(V3))

Note that it is group_by_, rather than group_by, and the %>% operator is used as %.% is deprecated.

Can you specify `m1` to be the name of a variable passed in a function? — vashts85, Aug 16 '18 at 13:38

score 3 · Answer 2 · edited Jul 31 '20 at 12:17

Here's an answer to this straightforward question, obtained by picking through hadley's solution to his posted dupe.

gdf <- df %.% regroup( lapply( someColumn, as.symbol)) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

FWIW, my use case involved grouping by one variable column and one constant column. The solution to that is:

gdf <- df %.% regroup( lapply( c( 'constant_column', someColumn), as.symbol)) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

Finally, the posted eval solution doesn't work. That just makes a new column whose values are all what someColumn evals to.

score 0 · Answer 3 · answered Oct 21 '15 at 16:42

You can use summarise_ as follow:

plotVar         = "Stocks_US_TotalCrudeOil"
dfBand <- mydf[ c( plotVar ,  "year", "week"  )  ] %>%
            filter ( year %in% bandYears )   %>%
            group_by (  week )   %>% 
            summarise_ (   ymini =  paste( "min(" ,  as.name(plotVar)  ,")"  ) 
                         , ymaxi =  paste( "max(" ,  as.name(plotVar)  ,")"  )     )
dfBand

score -1 · Answer 4 · answered May 16 '14 at 21:49

-1

enter image description here

pollutant <- "sulfate"
summarise(data, mean(eval(as.symbol(pollutant)), na.rm = TRUE))

I was trying to ask the same question for my own problem. Then I found a solution to it. I encapsulate the expression with eval(as.symbol()).

answered May 16 '14 at 21:49

CheJharia

79
1
7

Doesn't seem to work with my current version of dplyr – Calimo Nov 04 '14 at 17:38

score -2 · Answer 5 · answered Feb 02 '14 at 21:04

-2

I expect you just have to use eval

require(dplyr)
someColumn = "group"
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(eval(someColumn)) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

answered Feb 02 '14 at 21:04

Floris Padt

796
5
10

This doesn't work at all, just adds a new column called `eval(someColumn)` where every row is `"group"`. – Gregor Thomas May 16 '14 at 22:04

specify dplyr column names

5 Answers5

Linked