In R, create a sequence of columns using summarise in the dplyr package

Question

Dou you know how to create a sequence of columns without writing one by one? For example, I woul like to summarise for the states of some country. For example, if my df is

 head(df)
     Time   day  State  Vote
     d1-h1   1    state1  5
     d1-h2   1    state1  3
     d1-h3   1    state1  8
     d2-h1   2    state1  4
     d2-h2   2    state1  5
     d2-h3   2    state1  1
     d1-h1   1    state2  5
     d1-h2   1    state2  3
     d1-h3   1    state2  8
     d2-h1   2    state2  3
     d2-h2   2    state2  1
     d2-h3   2    state2  4

The one by one case is as follow

 df2<- df %>% group_by(Time) %>% summarise(state1 = sum(votes),
                                          state2 = sum(votes),
                                          ......
                                          state27 = sum(vote))

One solution for this problem is a loop, but it doesnt work

   states<- unique(df$State)
   for(i in 1:length(states))
   { 
       df2<- df %>% group_by(Time) %>% summarise( paste0("Vote_",states[i]) = sum(Vote[State == states[i]]))
   }

akrun · Accepted Answer · 2015-03-31T04:10:47.877

3

I guess this is a reshaping problem. You can try any of the 'long' to 'wide' format functions (dcast, spread or reshape from base R) to get the correct output.

library(reshape2)
dcast(df, Time~State, value.var='Vote')
#     Time state1 state2
#1 d1-h1      5      5
#2 d1-h2      3      3
#3 d1-h3      8      8
#4 d2-h1      4      3
#5 d2-h2      5      1
#6 d2-h3      1      4

Or

library(tidyr)
spread(df, State,Vote)

that gets the same output (based on the example provided)

library(dplyr)
df %>% 
   group_by(Time) %>% 
    summarise(state1=sum(Vote[State=='state1']), 
              state2=sum(Vote[State=='state2']))
#   Time state1 state2
#1 d1-h1      5      5
#2 d1-h2      3      3
#3 d1-h3      8      8
#4 d2-h1      4      3
#5 d2-h2      5      1
#6 d2-h3      1      4

Update

Suppose if there are multiple 'value' columns,

set.seed(22)
df$Vote2 <- sample(1:10, 12, replace=TRUE)

then, we can use devel version of data.table ie. 'v1.9.5', which can take multiple 'value.vars'. It can be installed from here

 library(data.table)
 dcast(setDT(df), Time~State, value.var=c('Vote', 'Vote2'))
 #    Time state1_Vote state2_Vote state1_Vote2 state2_Vote2
 #1: d1-h1           5           5            4            7
 #2: d1-h2           3           3            5            8
 #3: d1-h3           8           8           10            5
 #4: d2-h1           4           3            6            4
 #5: d2-h2           5           1            9           10
 #6: d2-h3           1           4            8            7

Or use reshape from base R

 reshape(df[-2], idvar='Time', timevar='State', direction='wide')
 #    Time Vote.state1 Vote2.state1 Vote.state2 Vote2.state2
 #1 d1-h1           5            4           5            7
 #2 d1-h2           3            5           3            8
 #3 d1-h3           8           10           8            5
 #4 d2-h1           4            6           3            4
 #5 d2-h2           5            9           1           10
 #6 d2-h3           1            8           4            7

data

df <- structure(list(Time = c("d1-h1", "d1-h2", "d1-h3", "d2-h1",
"d2-h2", 
"d2-h3", "d1-h1", "d1-h2", "d1-h3", "d2-h1", "d2-h2", "d2-h3"
 ), day = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), 
 State = c("state1", 
"state1", "state1", "state1", "state1", "state1", "state2", "state2", 
"state2", "state2", "state2", "state2"), Vote = c(5L, 3L, 8L, 
4L, 5L, 1L, 5L, 3L, 8L, 3L, 1L, 4L)), .Names = c("Time", "day", 
"State", "Vote"), class = "data.frame", row.names = c(NA, -12L))

edited Mar 31 '15 at 04:10

answered Mar 30 '15 at 17:41

akrun

874,273
37
540
662

Ohhh, this is magic! One more question... if the df had another column, for example, Number_electors, and I would like to create a second group, now by Number_electors. Is it possible agregate other parameter in the same line? – MAOC Mar 30 '15 at 18:51
@Vasco Can you update with a new example and expected result as it will make things more clear – akrun Mar 30 '15 at 18:53
Your solution, with the dcast function, creates two columns: For example, the column State1 is the sum of the column Votes restricted to the column State iqual to state1. Ok? I would like to generalizate this case... Suposse that you have other column: Votes2.. for Vote and Vote2, I would like to create state1_Vote, state2_Vote, state1_Vote2 and state2_Vote2... is it clear, now? – MAOC Mar 30 '15 at 19:05
@Vasco That will be easier with `dcast` from the devel version of `data.table` which can take multiple value.vars` – akrun Mar 30 '15 at 19:15
@Vasco Updated with the multiple column case. – akrun Mar 30 '15 at 19:27
There is some error: Error in .subset2(x, i, exact = exact) : subscript out of bounds In addition: Warning message: In if (!(value.var %in% names(data))) { : the condition has length > 1 and only the first element will be used – MAOC Mar 30 '15 at 19:45
@Vasco Is it from your original dataset or by using the `dput` data from my post? – akrun Mar 30 '15 at 19:48
Your data set... but I haven't downloaded the devel version... I will dowload and repeat the operation – MAOC Mar 30 '15 at 19:55
@Vasco It will not work on other versions as this is added recently – akrun Mar 30 '15 at 19:58
My version is v 1.9.4 – MAOC Mar 30 '15 at 20:01
It will not work on that. – akrun Mar 30 '15 at 20:04
Error: Command failed (1) in the install_github("Rdatatable/data.table", build_vignettes = FALSE) comand – MAOC Mar 30 '15 at 20:25
I am not sure why it fails for you. But, you can `melt` the `Vote` columns to a single column and then use `dcast` if this option is not working – akrun Mar 30 '15 at 20:26

In R, create a sequence of columns using summarise in the dplyr package

1 Answers1

Update

data