0

I have a data frame containing for each session (column "session") a sequence of actions (column "action"). Actions can be repeated within the same session (e.g. a->b->a for session 01), since what I am interested in is understanding the order in which they happen:

 x<- data.frame(
       session=c("01","01","01","02","02", "02","03","03"), 
       action=c("a","b","a","c","a","c", "a","b"))

I need to convert it into transactions format so that I can use 'arules' package to apply apriori algorithm for example. Desired output would be:

01 a,b,a

02 c,a,c

03 a,b

where basically for each session, the correspondent exact sequence is reported beside.

Which approach do you suggest?

Thank you.

marqui
  • 31
  • 2
  • Possible duplicate of [Aggregating by unique identifier and concatenating related values into a string](https://stackoverflow.com/questions/16596515/aggregating-by-unique-identifier-and-concatenating-related-values-into-a-string) – pogibas Mar 29 '18 at 11:52

2 Answers2

1

With base R, we can use aggregate

aggregate(action~ session, x, FUN = toString)
#   session  action
#1      01 a, b, a
#2      02 c, a, c
#3      03    a, b

If we need to convert to transactions

library(apriori)
as(split(x$action, x$session), "transactions")
akrun
  • 874,273
  • 37
  • 540
  • 662
0
x <- data.frame(session=c("01","01","01","02","02", "02","03","03"), 
                action=c("a","b","a","c","a","c", "a","b"))

library(dplyr)

x %>%
  group_by(session) %>%
  summarise(action = paste0(action, collapse = ","))

# # A tibble: 3 x 2
# session action
#   <fct>   <chr> 
# 1 01      a,b,a 
# 2 02      c,a,c 
# 3 03      a,b 
AntoniosK
  • 15,991
  • 2
  • 19
  • 32