9

I have a contingency table data matrix with 6 columns and 37 rows. I need to apply a Chi squared transformation to give me Row profiles and Column profiles for a correspondence analysis.

Unfortunately I've been told I will need to use nested loops to transform the data and carry out the CA (rather than doing it the more sensible ways in R). I was given the structure to use for my nested loop:

transformed.data=data0

for (row.index in 1:nrow(data)) {
  for (col.index in 1:ncol(data)) {
    transfomed.data[row.index,col.index]=
       "TRANSFORMATION"[row.index,col.index]
  }
}

From what i understand by using the nested loop it will apply my "TRANSFORMATION" first to the rows and then to the columns.

The transformation I want done on the data to get the row profiles is:

( X( ij ) / sum( X( i ) ) ) / sqrt( sum( X( j ) ) )

While the transformation I want done on the data to get the column profiles is:

( X( ij ) / sum( X( j ) ) ) / sqrt( sum( X( i ) ) )

What would I enter as my "TRANSFORMATION" in the last line of the nested loop to get it to output my desired transformation for profiles. Otherwise if I've miss understood the point of a nested loop here please describe what it would allow me to do.

This is the code for a subset of my data:

matrix(c(15366,2079,411,366,23223,2667,699,819,31632,2724,717,1473,49938,3111,1062,11964)
,nrow=4,ncol=4,byrow=T)

So using this subset alone I would expect the row profile for the first row to be:

0.002432689 0.0003291397 6.506803e-05 5.794379e-05

And the column profile for the first column to be:

0.0009473414, 0.0132572344, 0.0572742202, 0.0132863528 
Joris Meys
  • 106,551
  • 31
  • 221
  • 263
Confused
  • 91
  • 3
  • can you add some sample data to make your question [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? An input dataset and your expected outcome will be needed. ALso, have you searched for built in functions? The first hit on google gave me [this](http://www.statmethods.net/advstats/ca.html). – Chase Sep 08 '12 at 02:34
  • Thanks Chase, I'll just add some sample data to the first post. In regards to your second question for this assignment I have to first do the correspondence analysis step by step by transforming the data (the bit I'm stuck on) and doing a PCA on that and then do it the more sensible ways by corresp(original data) and ca( original data) – Confused Sep 08 '12 at 03:21
  • Sounds like homework? A few pieces of advice. 1) you don't need any for loops, 2) your formula can be made much easier if you use `colSums()` and `rowSums()` 3) when all else fails, you can look at the source code of functions to see how other authors have solved this same problem. To do this, type the function name without parens into the console. This *can* be a one line function with the above pieces of info. – Chase Sep 08 '12 at 03:55
  • Indeed.. Great thanks again! I was going to resort to not doing any for loops if I couldn't figure out the transformation bit because it did seem to complicate things! Our lecturer suggested using for loops and gave us the code above, I actually originally did it the way you suggested in 2) but it seemed almost too easy and I always find in that case that its probably not right which is why I was just double checking with the for loops :) – Confused Sep 08 '12 at 04:48
  • 1
    Is the expected outcome two matrices of the same number of rows and columns as the original data - one you might call "row transformation version" and one that is"col transformation version"? – Peter Ellis Sep 11 '12 at 11:34
  • 1
    The %o% function (outer) may also be helpful. – David F Sep 25 '12 at 08:16

1 Answers1

1

You can use this in these types of calculations without needing even a single loop. Rewrite your equation, and then you get :

Xtrans[i,j] = X[i,j] / ( sum( X[i, ] ) * sqrt( sum( X[ ,j] ) ) )

To get a matrix representing the term - sum( X[i, ] ) * sqrt( sum( X[ ,j] ) ) - you use the function outer() or %o% like this:

rowSums(X) %o% sqrt(colSums(X))

Or, for the column transformation :

sqrt(rowSums(X)) %o% colSums(X)

The only thing you need to do, is divide your original matrix by this one, eg for the col transformation :

TEST <- matrix(
               c(15366,2079,411,366,23223,2667,699,819,
                 31632,2724,717,1473,49938,3111,1062,11964),
                 nrow=4,ncol=4,byrow=T)

> TEST / (sqrt(rowSums(TEST)) %o% colSums(TEST))
             [,1]        [,2]        [,3]         [,4]
[1,] 0.0009473414 0.001455559 0.001053892 0.0001854284
[2,] 0.0011674098 0.001522501 0.001461474 0.0003383284
[3,] 0.0013770523 0.001346668 0.001298230 0.0005269580
[4,] 0.0016167998 0.001143812 0.001430074 0.0031831055

In approximately the same way you can calculate the row transformation.

Doing the hand calculations, I can confirm that my solution is correct, provided I understood your index notation correctly (meaning that i stands for rows and j for columns). The numbers you expect are not the ones you say you expect. To show you :

> ( TEST[1,2] / sum(TEST[,2]) ) / sqrt(sum(TEST[1,]))
[1] 0.001455559

The chi-square normalization you talk about, can actually be found in the function decostand of the vegan package. Mind you that by default, the method adjusts by multiplying by the square root of the matrix total. This makes sense in a correspondence analysis.

If you don't want to use this correction, then you can get eg the column transformation also as follows :

> require(vegan)
> decostand(TEST,method="chi.square",MARGIN=2)/sqrt(sum(TEST))
             [,1]         [,2]        [,3]        [,4]
[1,] 0.0009473414 0.0011674098 0.001377052 0.001616800
[2,] 0.0014555588 0.0015225011 0.001346668 0.001143812
[3,] 0.0010538924 0.0014614736 0.001298230 0.001430074
[4,] 0.0001854284 0.0003383284 0.000526958 0.003183106
attr(,"decostand")
[1] "chi.square"
Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • I know this is homework, but I guess we're well passed the due date for the assignment, hence adding a solution that's actually R like – Joris Meys Oct 02 '12 at 15:05