-1

I would like to plot the transition probabilities against time.

The dimension of the ["Activities"][1] matrix is ncol=144 and nrows=16533; act1_1...ac1_144 are time-steps, and time is represented in 10 minutes intervals (e.g. act1_1 = 4.10am; act1_2=4.20am..). Time start from 4am (act1_1) and ends at act1_144(4am).The columns are filled in with different activities, such 2=sleep, 48=watching Tv, 5=eating, etc.

Using this function I managed to calculate the transition probabilities between activities(Activities matrix).

I would like to plot on x axis time (10 minutes intervals) and y axis probabilities for example 2 to follow 3.

How can I do this?

Thanks

This is the plot that I am aiming for

[1]: https://i.stack.imgur.com/1UEUw.jpg

  • 1
    Welcome to SO. Please read [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and follow the guidelines. This will make it much easier for others to help you. – markus Dec 23 '18 at 09:07
  • @markus I updated the question could you remove the on hold tag? – RforBegginer Dec 26 '18 at 10:18

1 Answers1

0

To plot the development of the transition probabilities over time, you (by definition) need multiple observations for the transition from one action a to another action b.

Once you have multiple transaction probability matrices, it is actually very straightforward to plot their development over time. In my example below, most of the code is actually devoted to (re-)create your data and the different transition probability matrices.

library(ggplot2)

num_samples <- 1000
num_actions <- 20

# 1. generate activities dataframe (dropped ID column):
activ <- data.frame(v1 = sample(1:num_actions, num_samples, replace = TRUE), 
                    v2 = sample(1:num_actions, num_samples, replace = TRUE), 
                    v3 = sample(1:num_actions, num_samples, replace = TRUE), 
                    v4 = sample(1:num_actions, num_samples, replace = TRUE), 
                    v5 = sample(1:num_actions, num_samples, replace = TRUE), 
                    v6 = sample(1:num_actions, num_samples, replace = TRUE), 
                    v7 = sample(1:num_actions, num_samples, replace = TRUE))

num_transp <- ncol(activ) - 1 

# 2. calculate transition probabilities for each time step:
l_transp <- vector("list", num_transp)
for (t in 1:num_transp){
  transp <- matrix(0, nrow = num_actions, ncol = num_actions)
  data = activ[, t:(t+1)]
  for (action in 1:num_actions){
    rows_to_keep <- data[,1] == action
    counts <- table(data[rows_to_keep,])
    probs <- as.data.frame(counts/sum(rows_to_keep))
    follow_actions <- as.integer(as.character(probs[,2]))
    transp[action, follow_actions] <- probs$Freq
  }
  l_transp[[t]] <- transp
}

# 3. get development of transition probability from action 2 to action 3 over time:
to_plot <- vector("numeric", num_transp)
for (i in 1:num_transp){
  to_plot[i] <- l_transp[[i]][2, 3]
}

# plot development of transition probability from action 2 to action 3 over time:
ggplot(data.frame(x=1:num_transp, y=to_plot), aes(x=x, y=y)) +
  geom_line() + 
  xlab("time") +
  ylab("transition probability")

Running this will give you something like this

It looks very much like what one would expect from a scenario where 20 different actions were chosen randomly and independently over time.

apitsch
  • 1,532
  • 14
  • 31
  • thank so much for your help; I was wondering I would like to change the x axis to have time on it I came out with the following code do you think this could work? Or how to change the x axis? rownames<- c("04:00", "04:10","04:20", "04:30", "04:40", "04:50", "05:00", "05:10", "05:20", "05:30", "05:40", "05:50",...) – RforBegginer Dec 23 '18 at 19:58
  • @42 thank you fro your help - could you please advise me on how to represent my transition probability matrix? thank you – RforBegginer Dec 25 '18 at 06:10
  • thank you - how can i replace step 1 number with my data. Thus instead of generating new data I would like to use my matrix. The name name is my matrix is Activities and is ncol=144 and nrows=16533; act1_1...ac1_144 are time-steps, and time is represented in 10 minutes intervals (e.g. act1_1 = 4.10am; act1_2=4.20am..). Time start from 4am (act1_1) and ends at act1_144(4am). How can I read this mtarix and skip step one – RforBegginer Dec 26 '18 at 06:10
  • @markus I updated the question could you remove the on hold tag? – RforBegginer Dec 26 '18 at 10:17
  • thank you for your help, but how can I use my data instead of creating a dataset? How can I read my matrix – RforBegginer Dec 28 '18 at 12:57
  • The head (Activities) will return a tibble: 6 x 145 serial act1_1 act1_2 act1_3 act1_4 act1_5 act1_6 act1_7 act1_8 act1_9 act1_10 1 1.22e7 110 110 110 110 110 110 110 110 110 110 2 1.43e7 110 110 110 110 110 110 110 110 110 110 3 2.00e7 110 110 110 110 110 110 110 110 110 110 # ... with 137 more variables: act1_11 , act1_12 , – RforBegginer Dec 28 '18 at 12:59