dplyr repetition within %>% operator

Question

I am trying to use rep with dplyr but I do not fully understand why I can not make it work.

My data look like this. What I want is to simply repeat dayweek by n for each id.

head(dt4)

   id  dayweek n
1  1   Friday 3
2  1   Monday 3
3  1 Saturday 3
4  1   Sunday 3
5  1 Thursday 3
6  1  Tuesday 3

What I am trying to do is this within a dplyr flow

cbind(rep(dt4$id, dt4$n), rep(as.character(dt4$dayweek), dt4$n) )

which gives

    [,1] [,2]    
[1,] "1"  "Friday"
[2,] "1"  "Friday"
[3,] "1"  "Friday"
[4,] "1"  "Monday"
[5,] "1"  "Monday"
[6,] "1"  "Monday"

I do not understand why this code does not work

dt4 %>% 
  group_by(id) %>% 
  summarise(rep(dayweek, n))

Error: expecting a single value

Could someone help me with this ?

the data

dt4 = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), dayweek = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L), .Label = c("Friday", "Monday", "Saturday", "Sunday", 
"Thursday", "Tuesday", "Wedesnday"), class = "factor"), n = c(3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), class =     "data.frame", .Names = c("id", 
"dayweek", "n"), row.names = c(NA, -21L))

`summarise` is designed to return a single value per group. You will probably have more luck with `do` — David Arenburg, Aug 23 '15 at 12:56
related: https://stackoverflow.com/questions/21737815/ and https://github.com/hadley/dplyr/issues/154 — talat, Aug 23 '15 at 13:09
sorry I re-uploaded the data. It is because when i use `dput` on `dplyr` data-type (dont know the name) it doesnt not work properly — giac, Aug 23 '15 at 13:16
Any of these answers should work: http://stackoverflow.com/q/2894775/1191259 — Frank, Aug 24 '15 at 18:24

score 6 · Answer 1 · answered Aug 23 '15 at 13:21

data.table can be a useful alternative for this type of do-by operation - I find this a little easier to read:

library("data.table")
dt4 <- as.data.table(dt4)
head(dt4[, rep(dayweek, n), by=id], 10)

giving:

    id       V1
 1:  1   Friday
 2:  1   Friday
 3:  1   Friday
 4:  1   Monday
 5:  1   Monday
 6:  1   Monday
 7:  1 Saturday
 8:  1 Saturday
 9:  1 Saturday
10:  1   Sunday

akrun · Accepted Answer · 2015-08-25T02:20:25.107

4

To get the same result as cbind, we can use do. As @DavidArenburg mentioned, summarise output a single value/row per each group combination whereas using mutate we get the output with the same number of rows. But, here we are doing a different operation which can be done within the do environment. In the code . signifies the dataset. If we want to extract the 'id' column from dt4, we can either use dt4$id or dt4[['id']]. Replace the dt4 with ..

library(dplyr)
dt4 %>% 
    group_by(id) %>%
    do(data.frame(id=.$id, v1=rep(.$dayweek, .$n)))
#Source: local data frame [63 x 2]
#Groups: id

#  id       v1
#1   1   Friday
#2   1   Friday
#3   1   Friday
#4   1   Monday
#5   1   Monday
#6   1   Monday
#7   1 Saturday
#8   1 Saturday
#9   1 Saturday
#10  1   Sunday
#.. ..      ...

Or another option based on @Frank's comments would be to specify the row index generated from rep inside slice and select the columns that we need to keep.

dt4 %>%
     slice(rep(1:n(),n)) %>%
     select(-n)

edited Aug 25 '15 at 02:20

answered Aug 23 '15 at 12:58

akrun

874,273
37
540
662

ah interesting - could you explain me what `do` does ? and how do you use the `.$` symbol here ? thanks Akrun – giac Aug 23 '15 at 13:03
What does `group_by` do in this context? (Sorry, the dataset from the question still seems to be broken.) – maj Aug 23 '15 at 13:08
@maj In the example, there is only a single `id`, but I am guessing that in the original dataset, there might be multiple ids and we want to do the replication step within each id group – akrun Aug 23 '15 at 13:10
1

`do` is pretty terrible, performance-wise, and grouping is not necessary here. I'd go for `dt4 %>% slice(rep(1:n(),n)) %>% select(-n)` – Frank Aug 24 '15 at 18:49
@Frank BTW, that is a clever option. I never thought about it. I wonder why you didn't post that as a solution :-) – akrun Aug 25 '15 at 02:42
Thanks :) It's just a dplyr-flavored version of what I saw in the question I linked under the OP – Frank Aug 25 '15 at 03:05

dplyr repetition within %>% operator

2 Answers2