How to calculate transition probabilities in R

Question

I would like to calculate how often changes between values happen by person-year combination (panel data). This mimics Stata's command xttrans. The transition between index 6 and 7 should not be included, since it is not a transition from within one person.

df = data.frame(id=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2), 
                year=seq(from=2003, to=2009, by=1), 
                health=c(3,1,2,2,5,1,1,1,2,3,2,1,1,2))

Darren Tsai · Answer 1 · 2020-06-09T02:51:46.417

4

Here is a base R solution to calculate transition counts by id groups:

with(df, do.call(`+`, tapply(health, id, function(x){
  x <- factor(x, levels = min(health, na.rm = T):max(health, na.rm = T))
  table(x[-length(x)], x[-1])
})))

#    1 2 3 4 5
#  1 2 3 0 0 0
#  2 1 1 1 0 1
#  3 1 1 0 0 0
#  4 0 0 0 0 0
#  5 1 0 0 0 0

edited Jun 09 '20 at 02:51

answered Jun 05 '20 at 09:57

Darren Tsai

32,117
5
21
51

I'm trying to adjust the code. The issue is in my real data I have a unbalanced data. Different numbers of occurances and lots of NA. – Marco Jun 05 '20 at 12:39
@MarcoDoe My code can deal with unbalanced data. As for NAs, you don't provide example so I'm not sure about the real situation. – Darren Tsai Jun 06 '20 at 07:09
`min()` and `max()` throw errors if for some person I don't have health data. I'm sorry to not be clearer when creating the MWE. – Marco Jun 08 '20 at 13:39

score 1 · Accepted Answer · answered Jun 05 '20 at 10:04

1

library(tidyverse)

# Calculate the last health status for each id
df <- df %>% 
         group_by(id) %>% 
         mutate(lastHealth=lag(health)) %>%  
         ungroup()
# Count nunmber of existing transitions
transitions <- df %>% 
                  group_by(health, lastHealth) %>%  
                  summarise(N=n()) %>% 
                  ungroup()
# Fill in the transition grid to include possible transitions that weren't observed
transitions <- transitions %>% 
                 complete(health=1:5, lastHealth=1:5, fill=list(N=0))
# Present the transitions in the required format
transitions %>% 
  pivot_wider(names_from="health", values_from="N", names_prefix="health") %>%
  filter(!is.na(lastHealth))

answered Jun 05 '20 at 10:04

Limey

10,234
2
12
32

I cannot use `pivot_wider`. Figuring out the reasons related to old R versions https://stackoverflow.com/questions/56534005/how-to-install-pivot-long-and-pivot-wide-in-r – Marco Jun 05 '20 at 12:37
Yes. `pivot_wider` is relatively new. In older versions, the function was `spread`. In this simple case, the conversion should be straightforward, but let me know if you can't figure it out. I'll provide an alternative. [You're happy with everything up to that point though?] – Limey Jun 05 '20 at 13:23
I working on adjusting everything to the n-dimensional case including potential NA. – Marco Jun 05 '20 at 14:21
I believe my code handles `NA` in `health` already. If not, let me know. If by multidimensional you mean "more group variables than just `id`", then modify the two `group_by`s to include the additional variables in front of what's there now. if you have more state categories than `1:5`, modify the two `1:5`s in `complete()` accordingly. – Limey Jun 05 '20 at 14:24
It works :) I actually only need to create transitions. The pivot_wider makes the table more readable, but it can also be read rowwise. – Marco Jun 08 '20 at 13:48

How to calculate transition probabilities in R

2 Answers2