2

It seems that PST cannot predict the conditional probabilities of the next state after contexts which consist of a single state, e.g. EX-EX

Consider this code:

# Load libraries
library(RCurl)
library(TraMineR)
library(PST)

# Get data
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/c2539d06771317c5f4c8d3a2052a73fc485a09c6/challenge_level.csv")
data <- read.csv(text = x)

# Load and transform data
data <- read.table("thread_level.csv", sep = ",", header = F, stringsAsFactors = F)

# Create sequence object
data.seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = NA, right= NA, nr = "*")

# Make a tree
S1 <- pstree(data.seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = TRUE)

# Mine the context
context <- seqdef("EX-EX")
p_context <- predict(S1.p1, context, decomp = F, output = "prob")

The line context <- seqdef("EX-EX") yields:

[>] 1 distinct states appear in the data: 
     1 = EX
Error: 
 [!] alphabet contains only one state

which means that predict() cannot be executed.

How do I predict the conditional probabilities of the next state based on contexts which only have 1 state, which may be repeated multiple times?

histelheim
  • 4,938
  • 6
  • 33
  • 63

1 Answers1

2

This is an issue of seqdef that has been fixed since version 1.8-12.

Here is what I get with TraMineR 1.8-13

> context <- seqdef("EX-EX")
 [>] 1 distinct states appear in the data: 
     1 = EX
 [>] state coding:
       [alphabet]  [label]  [long label] 
     1  EX          EX       EX
 [>] 1 sequences in the data set
 [>] min/max sequence length: 2/2
> p_context <- predict(S1, context, decomp = F, output = "prob")
 [>] 1 sequence(s) - min/max length: 2/2
 [>] max. context length: L=6
 [>] found 2 distinct context(s)
 [>] total time: 0.019 secs
> p_context
           prob
[1] 0.000476372

Note that I replaced your undefined S1.p1 with S1.

Gilbert
  • 3,570
  • 18
  • 28
  • This works for contexts which repeat the same marker, e.g. `EX-EX`. However, contexts that are 1 marker long, e.g. `EX` still are not computed, but here the problem seem to be in `predict()` not in `seqdef()` – histelheim Jan 27 '17 at 21:06
  • 1
    For `EX`, which is not really a sequence, its probability is simply its probability of occurrence in the data. You get it for instance as `seqstatf(data.seq)["EX",2]/100`. – Gilbert Jan 29 '17 at 09:07