1

My dataframe(m*n) has few hundreds of columns, i need to compare each column with all other columns (contingency table) and perform chisq test and save the results for each column in different variable.

Its working for one column at a time like,

s <- function(x) {
  a <- table(x,data[,1])
  b <- chisq.test(a)
}
c1 <- apply(data,2,s)

The results are stored in c1 for column 1, but how will I loop this over all columns and save result for each column for further analysis?

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
Ram
  • 331
  • 1
  • 3
  • 11

2 Answers2

4

Fundamentally, you have a few problems here:

  1. You're relying heavily on global arguments rather than local ones. This makes the double usage of "data" confusing.

  2. Similarly, you rely on a hard-coded value (column 1) instead of passing it as an argument to the function.

  3. You're not extracting the one value you need from the chisq.test(). This means your result gets returned as a list.

  4. You didn't provide some example data. So here's some:

    m <- 10 n <- 4 mytable <- matrix(runif(m*n),nrow=m,ncol=n)

Once you fix the above problems, simply run a loop over various columns (since you've now avoided hard-coding the column) and store the result.

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
4

If you're sure you want to do this (I wouldn't, thinking about the multitesting problem), work with lists :

Data <- data.frame(
    x=sample(letters[1:3],20,TRUE),
    y=sample(letters[1:3],20,TRUE),
    z=sample(letters[1:3],20,TRUE)
  )

# Make a nice list of indices
ids <- combn(names(Data),2,simplify=FALSE)

# use the appropriate apply
my.results <- lapply(ids,
      function(z) chisq.test(table(Data[,z]))
    )
# use some paste voodoo to give the results the names of the column indices
names(my.results) <- sapply(ids,paste,collapse="-")

# select all values for y :
my.results[grep("y",names(my.results))]

Not harder than that. As I show you in the last line, you can easily get all tests for a specific column, so there is no need to make a list for each column. That just takes longer and takes more space, but gives the same information. You can write a small convenience function to extract the data you need :

extract <- function(col,l){
    l[grep(col,names(l))]
}
extract("^y$",my.results)

Which makes you can even loop over different column names of your dataframe and get a list of lists returned :

lapply(names(Data),extract,my.results)

I strongly suggest you get yourself acquainted with working with lists, they're one of the most powerful and clean ways of doing things in R.

PS : Be aware that you save the whole chisq.test object in your list. If you only need the value for Chi square or the p-value, select them first.

Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • But i need results for each column seperately in a different list. – Ram Jul 20 '11 at 13:13
  • @Ram : It's not really a problem splitting up the list you get this way. If you loop over the chi2 for every column, then you calculate all values twice. – Joris Meys Jul 25 '11 at 18:13
  • yes thats what i need, as each column is considered to be independent. – Ram Jul 28 '11 at 18:25
  • @Ram : off course, but chi2(A,B) == chi2(B,A), so why calculate it twice? It just takes double of the time. I added some example of how you can get the columns out that you need. – Joris Meys Jul 29 '11 at 09:06
  • Thank you very much for the support.I do have 318 different columns named as site1,site2,....site318. when i try to get site1, it gives all site that has 1 in it, say like 1,10,11,121,...How can i only get those with only 1. – Ram Jul 29 '11 at 22:24
  • @Ram : use the ^ and $ operators in the grep pattern as I've edited in the answer. They stand for the beginning and end of the string. See also `?regex` for a very powerful toolbox. – Joris Meys Jul 30 '11 at 00:44
  • The lapply function gives me a list of lists, but how will i seperate the list for each column and store in a different list? – Ram Aug 17 '11 at 02:26
  • @Ram : I'm not sure what you mean. Create a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), and ask it as a new question. More chance of getting a good answer that way. – Joris Meys Aug 17 '11 at 11:10