5

Consider the following dataframe:

 df = data.frame(cusip = paste("A", 1:10, sep = ""), xt = c(1,2,3,2,3,5,2,4,5,1), xt1 = c(1,4,2,1,1,4,2,2,2,5))

The data is divided in five states, which are quantiles in reality: 1,2,3,4,5. The first column of the dataframe represents the state at time t, and the second column is the state at time t+1.

I would like to compute a sort of a transition matrix for the five states. The meaning of the matrix would be as follows:

  • (Row, Col) = (1,1) : % of cusips that were in quantile 1 at time t, and stayed at 1 in time t+1
  • (Row, Col) = (1,2) : % of cusips that were in quantile 1 at t, and became quantile 2 at t+1
  • etc...

I am really not sure how to do this in an efficient way. I have the feeling the answer is trivial, but I just can't get my head around it.

Could anyone please help?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Mayou
  • 8,498
  • 16
  • 59
  • 98

2 Answers2

5
res <- with(df, table(xt, xt1)) ## table() to form transition matrix
res/rowSums(res)                ## /rowSums() to normalize by row
#    xt1
# xt          1         2         4         5
#   1 0.5000000 0.0000000 0.0000000 0.5000000
#   2 0.3333333 0.3333333 0.3333333 0.0000000
#   3 0.5000000 0.5000000 0.0000000 0.0000000
#   4 0.0000000 1.0000000 0.0000000 0.0000000
#   5 0.0000000 0.5000000 0.5000000 0.0000000

## As an alternative to  2nd line above, use sweep(), which won't rely on 
## implicit recycling of vector returned by rowSums(res)
sweep(res, MARGIN = 1, STATS = rowSums(res), FUN = `/`)
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Perfect!! Could you please elaborate as to what the first line does? – Mayou Jan 27 '14 at 21:35
  • 1
    Given n arguments, `table()` forms a n-dimensional array whose elements contain the numbers of times each particular combination of `xt` and `xt1` appear in your data. `with(df, ...)` just forms a local evaluation environment within which `xt` and `xt1` (and any other named columns in `df`) will be directly visible: that statement is essentially shorthand for `table(df$xt, df$xt1)`. – Josh O'Brien Jan 27 '14 at 21:42
  • Thanks a lot for your thorough answer! Very helpful! – Mayou Jan 27 '14 at 21:47
  • 1
    Also `prop.table(res,1)`. I don't think that wheel needs reinventing, eh? – Frank Dec 02 '15 at 18:47
  • 1
    @Frank Sure, though I typically prefer knowing about the more general function (here `sweep()`) and how it works, so that I can quickly apply it in a broad range of circumstances. It's probably for similar reasons that `?prop.table` describes the function by stating, "This is really ‘sweep(x, margin, margin.table(x, margin), "/")’ for newbies [...]"! – Josh O'Brien Dec 02 '15 at 18:53
0

If you want to have all the states (1..5) in the column of the transition matrix, you can try this:

levs <- sort(union(df$xt, df$xt1)) 
tbl <- table(factor(df$xt, levs), factor(df$xt1, levs))
tbl / rowSums(tbl)  

         1         2         3         4         5
  1 0.5000000 0.0000000 0.0000000 0.0000000 0.5000000
  2 0.3333333 0.3333333 0.0000000 0.3333333 0.0000000
  3 0.5000000 0.5000000 0.0000000 0.0000000 0.0000000
  4 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000
  5 0.0000000 0.5000000 0.0000000 0.5000000 0.0000000
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63