I'm trying to use TraMineR but am open to feedback/references/links to more info as to how to represent multi-channel or hierarchical event sequences and algorithms that deal with it.
I have a complex event structure that I'm trying to figure out how to represent as a sequence. There are different types of events. Each event type may have a different set of fields (and different numbers of fields). For instance, age might be a field in one event type whereas height might be a field in another event type. My first instinct (and I believe a common approach) was to “flatten” everything, e.g. every possible combination of values for an event constitutes a unique event type. However, this may miss patterns in the generic event types.
For example, let's say I'm a dog breeder and drink a lot of coffee and I want to see if there are patterns in my coffee/dog buying habits (yes, silly example). I might have events like:
- Bought dog
- Breed: hound
- Sex: female
- Bought coffee
- Store: Starbucks
- Roast: dark
- Bought dog
- Breed: hound
- Sex: female
- Bought coffee
- Store: Starbucks
- Roast: light
- Bought dog
- Breed: Doberman pincher
- Sex: male
To flatten the data I may say that every unique combination of store and roast is a unique coffee buying event. Also, every unique combination of breed and sex is a unique dog buying event. This approach would turn the example above into 5 different event types (rather than 2 event types with fields). This representation could detect patterns such as the following: if I drink 2 dark roast coffees from Starbucks then I am more likely to by a male Doberman pincher.
However, this representation may miss more general patterns that don't depend on field values in the events. For instance, it may be the case that I simply buy a dog after having two coffees in general.
I'd like to be able to detect patterns at both "levels" and am unsure of how to represent the events to do so. Of course one approach would be to use both representations and then just combine the results of the two.
So, questions are: 1. Any links/citations to papers that deal with this? 2. Is this a common issue? 3. Any recommendations on how to represent these events? 4. Any recommendations on how to work with them in TraMineR 5. Any recommendations / links / references to algorithms that deal with this sort of thing? 6. Any ideas at all?
Thanks!!!