2

I'm trying to build a sequence which takes the previous value and adds it to the sequence.

For example:

Var1 Var2 
1     A
1     B
1     C
2     A
2     C
2     D

The output I'm looking for is:

Var1 Var2 Var3
1    A   A
1    B   A>B
1    C   A>B>C
2    A   A
2    C   A>C
2    D   A>C>D

Is there a package for this? The number of elements in the sequence can get quite large, so my current method using lag within dplyr isn't feasible without writing out the same piece of code n times (where n is the maximum number of elements in a sequence).

Sang won kim
  • 524
  • 5
  • 21

3 Answers3

1

You could use by() and take advantage of the useful R factors. We get "numeric"s by transform/re-transform to factors and back. Generate growing sequences with Map, transform to factors again and assign labels according to Var2, collapse with ">". concatenate and unlist everything and you'll get "Var3". (Might be slow in big data frames, though.)

dat$Var3 <- unlist(do.call(c, by(dat, dat$Var1, function(s) {
  r <- Map(seq, as.numeric(factor(s$Var2)))
  r <- lapply(r, levels=1:3, labels=s$Var2, factor)
  return(Map(paste, r, collapse=">"))
})))
dat
#   Var1 Var2  Var3
# 1    1    A     A
# 2    1    B   A>B
# 3    1    C A>B>C
# 4    2    A     A
# 5    2    C   A>C
# 6    2    D A>C>D

Data

dat <- structure(list(Var1 = c(1L, 1L, 1L, 2L, 2L, 2L), Var2 = c("A", 
"B", "C", "A", "C", "D")), row.names = c(NA, -6L), class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110
1

You could do:

transform(dat, Var3 = ave(Var2, Var1, FUN = function(x) sapply(seq_along(x), function(i) paste(x[1:i], collapse = ">"))))

  Var1 Var2  Var3
1    1    A     A
2    1    B   A>B
3    1    C A>B>C
4    2    A     A
5    2    C   A>C
6    2    D A>C>D
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
1

I recommend you package runner for sequential functions. Function runner::runner applies any R function passed to f (output type needs to be specified)

# data
df <- data.frame(
  var1 = rep(c(1,2), each = 3), 
  var2 = rep(c("A", "B", "C"), 2))

# result
library(dplyr)
library(runner)
df %>%
 group_by(var1) %>%
 mutate(var3 = runner(var2, 
                      function(x) paste(x, collapse = ">"),
                      type = "character")) 


 #    var1 var2  var3 
 #   <dbl> <fct> <chr>
 # 1     1 A     A    
 # 2     1 B     A>B  
 # 3     1 C     A>B>C
 # 4     2 A     A    
 # 5     2 B     A>B  
 # 6     2 C     A>B>C

Check documentation for more options

GoGonzo
  • 2,637
  • 1
  • 18
  • 25