3

I have a data frame containing independent counts of two observers of the same process.

obs.1 <- c(2,10,53,13,12,15,5)
obs.2 <- c(3,12,45,2,7,17,5)
df <- data.frame(obs.1,obs.2)

I want to use a chi-square test (chisq.test in R "MASS") on each row to see if there is a significant difference between obs.1 to obs.2. I would like to add the results (x-squared, p-value) to the df. I have the feeling the apply function is the correct way to implement this but haven't been successful.

doncarlos
  • 401
  • 4
  • 16
  • Have you tried `cbind(df, t(apply(df, 1, function(x) {ch <- chisq.test(x); c(unname(ch$statistic), ch$p.value)})))` – akrun Jan 28 '15 at 12:57
  • @CathG I am using chisq as it is used in other similar examples. Kappa is for categorical data only? – doncarlos Jan 28 '15 at 13:27
  • 1
    @doncarlos If you have doubts about which test to use ( in general statistical questions), http://stats.stackexchange.com/ might be a better place to post the question – akrun Jan 28 '15 at 13:38
  • @akrun, after more intense thinking (...), I'll change my first idea of kappa to a wilcoxon (or t test, depending on the number of points), mostly because kappa is indeed more appropriate for categorical data and so little changes in the value between obervers can yield in bad kappa coeff, while it may not actually be significant difference. but I guess, it really depends on the "nature of data" – Cath Jan 28 '15 at 13:39
  • @CathG Thanks for your thoughts. Just to clarify; two observers are independently looking at the same process (objects passing) and count what they see. For each row the color of the objects is different and so I want to see if there are any statistical differences between the two observers / color combinations. – doncarlos Jan 28 '15 at 13:40
  • 2
    so I definitely would go for a pairwise test (and really definitely not a chi-square by row...) but as @akrun said, you can ask this question on stats.exchange. – Cath Jan 28 '15 at 13:42

3 Answers3

8

Here is another option using dplyr:

library(dplyr)

df %>%
  rowwise() %>% 
  mutate(
    test_stat = chisq.test(c(obs.1, obs.2))$statistic,
    p_val = chisq.test(c(obs.1, obs.2))$p.value
    )
davechilders
  • 8,693
  • 2
  • 18
  • 18
3

You can use apply with "MARGIN =1" to and then do the chisq.test. Extract the values using $statistic and $p.value and cbind it to the dataset.

 df1 <- cbind(df, t(apply(df, 1, function(x) {
             ch <- chisq.test(x)
             c(unname(ch$statistic), ch$p.value)})))

 colnames(df1)[3:4] <- c('x-squared', 'p-value')
akrun
  • 874,273
  • 37
  • 540
  • 662
2

There's a number of ways to do this. One is using apply to go through each line (MARGINE = 1) and then extract whatever part of the output you want (I use lapply to climb through each list element).

xy <- data.frame(obs1 = c(3,12,45,2,7,17,5), obs2 = c(2,10,53,13,12,15,5))
result <- apply(X = xy, MARGIN = 1, FUN = chisq.test)

Warning message:
In FUN(newX[, i], ...) : Chi-squared approximation may be incorrect

# see where p-value is stored
str(chisq.test(xy[1, ]))

List of 9
 $ statistic: Named num 0.2
  ..- attr(*, "names")= chr "X-squared"
 $ parameter: Named num 1
  ..- attr(*, "names")= chr "df"
 $ p.value  : num 0.655 # thar she blows
 $ method   : chr "Chi-squared test for given probabilities"
 $ data.name: chr "xy[1, ]"
 $ observed : num [1:2] 3 2
 $ expected : num [1:2] 2.5 2.5
 $ residuals: num [1:2] 0.316 -0.316
 $ stdres   : num [1:2] 0.447 -0.447
 - attr(*, "class")= chr "htest"

Warning message:
In chisq.test(xy[1, ]) : Chi-squared approximation may be incorrect

unlist(lapply(result, "[", "p.value"), use.names = FALSE)

[1] 0.654720846 0.669815358 0.419020334 0.004508698 0.251349109 0.723673610 1.000000000
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197