3

I'm newbie to R. I need to aggregate the data by group with sequence. I'm adding my data frame. first two column is given data, I have to mutate third column.

df <- data.frame(id = c(rep("a",3), rep("b",2), rep("c", 4)),
               value = c("x", "z", "p", "q", "q", "m", "n", "x", "y"), 
               reqd = c("x,z,p", "z,p", "p", "q,q","q", "m,n,x,y", "n,x,y", "x,y", "y"))

I had aggregated by group but its not correct

df_2 <- df[,1:2]  %>% 
group_by(id) %>% 
mutate(reqd2 = paste(value, collapse = ","))
Saaz
  • 55
  • 7
  • So, what does it mean to "aggregate the data by group with sequence"? The easiest way for us to understand is probably to provide a desired output! – erocoar May 02 '19 at 13:15
  • Related: [Cumulatively paste (concatenate) values grouped by another variable](https://stackoverflow.com/questions/24862046/cumulatively-paste-concatenate-values-grouped-by-another-variable) – Henrik May 02 '19 at 14:02

2 Answers2

2

We can group_by id and create a sequence between current row_number and total number of rows in each group and concatenate the corresponding value with toString.

library(dplyr)
library(tidyr)

df %>%
  group_by(id) %>%
  mutate(reqd1 = map2_chr(row_number(),n(),~toString(value[.x:.y])))

#  id    value reqd    reqd1     
#  <fct> <fct> <fct>   <chr>     
#1 a     x     x,z,p   x, z, p   
#2 a     z     z,p     z, p      
#3 a     p     p       p         
#4 b     q     q,q     q, q      
#5 b     q     q       q         
#6 c     m     m,n,x,y m, n, x, y
#7 c     n     n,x,y   n, x, y   
#8 c     x     x,y     x, y      
#9 c     y     y       y        

We can also do this using only base R with ave

with(df, ave(value, id, FUN = function(x) 
        mapply(function(i, j) toString(x[i:j]), seq_along(x), length(x))))

#[1] "x, z, p"  "z, p"  "p"  "q, q"   "q"   "m, n, x, y"  "n, x, y"  "x, y" "y" 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I had never used purrr package earlier, Can I add special character something like _@_ as separator while concatenating – Saaz May 02 '19 at 13:54
  • @Saaz yes, you can use `paste` like `df %>% group_by(id) %>% mutate(reqd1 = map2_chr(row_number(),n(),~paste(value[.x:.y], collapse = "@")))` – Ronak Shah May 02 '19 at 13:56
1

We can do this with data.table

library(data.table)
setDT(df)[, reqd1 := unlist(Map(function(i, j)
    toString(value[seq(i, j, 1)]), seq_len(.N), .N)),  by = id]
df
#   id value    reqd      reqd1
#1:  a     x   x,z,p    x, z, p
#2:  a     z     z,p       z, p
#3:  a     p       p          p
#4:  b     q     q,q       q, q
#5:  b     q       q          q
#6:  c     m m,n,x,y m, n, x, y
#7:  c     n   n,x,y    n, x, y
#8:  c     x     x,y       x, y
#9:  c     y       y          y
akrun
  • 874,273
  • 37
  • 540
  • 662