R: add missing rows not using for loop

Question

Following this question: Transition matrix

We use its setup:

#Please use the setup in the following **EDIT** section.
#df = data.frame(cusip = paste("A", 1:10, sep = ""), xt = c(1,2,3,2,3,5,2,4,5,1), xt1 = c(1,4,2,1,1,4,2,2,2,5))
   cusip xt xt1
1     A1  1   1
2     A2  2   4
3     A3  3   2
4     A4  2   1
5     A5  3   1
6     A6  5   4
7     A7  2   2
8     A8  4   2
9     A9  5   2
10   A10  1   5

According to the answers in that post, we can get a transition matrix as follows:

res <- with(df, table(xt, xt1)) ## table() to form transition matrix
res/rowSums(res)                ## /rowSums() to normalize by row
#    xt1
# xt          1         2         4         5
#   1 0.5000000 0.0000000 0.0000000 0.5000000
#   2 0.3333333 0.3333333 0.3333333 0.0000000
#   3 0.5000000 0.5000000 0.0000000 0.0000000
#   4 0.0000000 1.0000000 0.0000000 0.0000000
#   5 0.0000000 0.5000000 0.5000000 0.0000000

We notice that there is no column 3 because there is no state 3 at time t+1. However in math the transition matrix has to be square. For this situation, we still need a column 3 where [3,3]=1 and other elements=0 (the rule is that for any missing column n or missing row n, we set [n,n]=1 and other elements in that row/column =0) which is as follows:

#    xt1
# xt          1         2         3         4         5
#   1 0.5000000 0.0000000 0.0000000 0.0000000 0.5000000
#   2 0.3333333 0.3333333 0.0000000 0.3333333 0.0000000
#   3 0.5000000 0.5000000 1.0000000 0.0000000 0.0000000
#   4 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000
#   5 0.0000000 0.5000000 0.0000000 0.5000000 0.0000000

Can I achieve that without writing a messy for loop? Thank you.

EDIT: Please use this dataset instead:

df = data.frame(cusip = paste("A", 1:10, sep = ""), xt = c(2,2,3,2,3,5,2,4,5,4), xt1 = c(1,4,2,1,1,4,2,3,2,5))
   cusip xt xt1
1     A1  2   1
2     A2  2   4
3     A3  3   2
4     A4  2   1
5     A5  3   1
6     A6  5   4
7     A7  2   2
8     A8  4   3
9     A9  5   2
10   A10  4   5

now we have the transition matrix as follows:

res <- with(df, table(xt, xt1)) 
res/rowSums(res)                
   xt1
xt     1    2    3    4    5
  2 0.50 0.25 0.00 0.25 0.00
  3 0.50 0.50 0.00 0.00 0.00
  4 0.00 0.00 0.50 0.00 0.50
  5 0.00 0.50 0.00 0.50 0.00

Notice that row 1 is missing. Now I want a new row 1 in which [1,1]=1 and other elements =0 (so that this row sums up to 1). Get something like:

   xt1
xt     1    2    3    4    5
  1 1.00 0.00 0.00 0.00 0.00
  2 0.50 0.25 0.00 0.25 0.00
  3 0.50 0.50 0.00 0.00 0.00
  4 0.00 0.00 0.50 0.00 0.50
  5 0.00 0.50 0.00 0.50 0.00

How can I achieve that (add the missing row)?

Your `xt` and `xt1` should be factors with appropriate "levels", then `table` will include even missing levels and construction of the matrix will be hunky-dory (or nearly so). This Q&A may be helpful: http://stackoverflow.com/questions/1617061/including-missing-values-in-table-results-in-r — Frank, Dec 02 '15 at 18:27
@frank yea the problem is that `xt1` does not have level 3, which is state 3, in `df`, but we still need to take it into consideration, which is why I need a column 3. — Natalia, Dec 02 '15 at 18:29
@Natalia frank means like this `with(df, table(xt, factor(xt1, levels = 1:5)))` although it would be better to define the factor/levels in the data frame — rawr, Dec 02 '15 at 18:31
@mkemp6 you're right about the "whole column should be zeros". It's just in practice I assigned an "1" to [n,n], actually I can assign any number to [n,3] because it's like I have 5 degree of freedom so I can randomly assign number to it. — Natalia, Dec 02 '15 at 18:50
@rawr do you mean that I need to define the factor/levels before I do the `res` thing? — Natalia, Dec 02 '15 at 18:51
@Natalia frank's answer will also work for your edit, just change `colSums` to `rowSums` — rawr, Dec 02 '15 at 19:50
@rawr actually I can't get the same result using the code.. it generated some blank spaces in the matrix... — Natalia, Dec 02 '15 at 21:21
@Natalia I meant in this line `tab + diag(colSums(tab)==0)` you should use the `prop.table` that frank used instead of `res/rowSums(res)`. otherwise, you are computing `0/0` when the column or row sums are 0 and that is where the blanks are coming from. look at `str(res/rowSums(res))` — rawr, Dec 02 '15 at 21:30
Fyi, best to just ask one question at a time. When you make an edit, you should just edit the post into its best form possible. If folks want to look up the edit history they can, since it is publicly viewable from the "edited x hours ago" link at the bottom. — Frank, Dec 02 '15 at 21:40

Frank · Accepted Answer · 2015-12-02T21:38:35.687

1

Here's a way to do it (only looking at the second question posed):

# setup
df = data.frame(
  cusip = paste("A", 1:10, sep = ""), 
  xt = c(2,2,3,2,3,5,2,4,5,4), 
  xt1 = c(1,4,2,1,1,4,2,3,2,5)
)

df$xt   = factor(df$xt, levels=1:5)
df$xt1  = factor(df$xt1, levels=1:5)

# making the transition frequency table
tab = with(df, prop.table(table(xt,xt1), 1))

#    xt1
# xt     1    2    3    4    5
#   1                         
#   2 0.50 0.25 0.00 0.25 0.00
#   3 0.50 0.50 0.00 0.00 0.00
#   4 0.00 0.00 0.50 0.00 0.50
#   5 0.00 0.50 0.00 0.50 0.00

This is the correct table for describing the frequency of transitions observed in the data df. If, however, you want to impute a transition rule where no data is available, there are some options. The OP wants to impute that any unobserved states are "absorbing states":

r = rowSums(tab,na.rm=TRUE)==0

tab[r, ] <- diag(nrow(tab))[r,,drop=FALSE]

#    xt1
# xt     1    2    3    4    5
#   1 1.00 0.00 0.00 0.00 0.00
#   2 0.50 0.25 0.00 0.25 0.00
#   3 0.50 0.50 0.00 0.00 0.00
#   4 0.00 0.00 0.50 0.00 0.50
#   5 0.00 0.50 0.00 0.50 0.00

I don't think this is a good idea, since it is hiding features of the true data.

edited Dec 02 '15 at 21:38

answered Dec 02 '15 at 18:39

Frank

66,179
8
96
180

right. This is not a transition matrix. But now I am confused.. Because I think a transition matrix has to be square (which is 5*5 in this case)... – Natalia Dec 02 '15 at 18:57
@Natalia If you leave it as 0 instead of switching to 1 (as in the `tab` object), then it is a transition matrix. An empirical transition matrix (which is what you have here, merely describing frequencies observed) need not be square. For example, if you see A->A and A->B and that's all, then there is no way to write probabilities for what happens starting from B, so it will not be square. – Frank Dec 02 '15 at 19:00
but wiki says the definition of transition matrix C is that we can premultiply C by A and get B (CA=B). A here is a 5*1 matrix and B is also 5*1. If C is not 5*5 (square), how can we get a 5*1 vector B? – Natalia Dec 02 '15 at 19:12
@Natalia I have not reviewed the wiki to see what those letters mean. If your transition matrix describes a data-generating process, then, yes, it should be square. What you have here is different -- you are taking some realized transitions and summarizing them by looking at frequencies for each transition. It is common to call this a transition matrix as well, even though it might not technically be correct. It is quite possible that some frequencies will not be computable, since #fromAtoB/#fromA cannot be computed if #fromA is 0. A and B here refer to states. – Frank Dec 02 '15 at 19:16
yes right. So what about missing rows? I edited my question to missing rows. In which case if I assigned an "1" to [1,1], the 1st row can sums up to 1. – Natalia Dec 02 '15 at 19:29
@Natalia I don't think you understand what a transition matrix is used for. First, decide what you even mean by writing transition probabilities from a state with no observations... – Frank Dec 02 '15 at 19:41
when using `tab + diag(colSums(tab)==0)`, why can't I convert the blanks to numbers? plus, it replace all my diagonal terms to blanks... – Natalia Dec 02 '15 at 21:21
@Natalia Okay, I've updated my answer to only address the second part of your question (since the first doesn't quite make sense) and deal with the blanks you were seeing. – Frank Dec 02 '15 at 21:39

R: add missing rows not using for loop

1 Answers1