0

I am interested in generating a chisquare value (X-squared and pvalue) on each row and appending the test results in separate columns. The data I have is a gene for each row and counts of mutations or normal (wild-type) for two separate groups. Here is the setup for an example dataset:

Genes<-c("GENE_A", "GENE_B","GENE_C")
Group1_Mut<-c(20,10,5)
Group1_WT<-c(40,50,55)
Group2_Mut<-c(10, 30, 10)
Group2_WT<-c(80, 60, 80)
main<-data.frame(Genes,Group1_Mut,Group1_WT,Group2_Mut,Group2_WT)

When I try to pass the first row as a matrix to the chi-square test I get this error:

chisq.test(matrix(main[1,2:5], nrow=2, 2,2)) Error in sum(x) : invalid 'type' (list) of argument

Any ideas how I could create a function for a 2x2 table to iterate through the list and append the X-squared and pvalues for each gene in the main table?

Note: I did see this other example in SF: chi square test for each row in data frame

but it didn't quite fit what I was trying to apply here.

Community
  • 1
  • 1
user2900006
  • 427
  • 1
  • 4
  • 15

2 Answers2

3

To see why the error is trying to communicate, compare your data with the type of data chisq.test is expecting:

dput(matrix(main[1,2:5,drop=T], nrow=2, 2,2))
# structure(list(20, 10, 40, 80), .Dim = c(2L, 2L))
dput(matrix(1:4, nrow=2, 2,2))
# structure(c(1L, 3L, 2L, 4L), .Dim = c(2L, 2L))

One remedy is to force you data into a numeric vector:

res <- chisq.test(matrix(as.numeric(main[1,2:5]), nrow=2, 2,2))
res
#   Pearson's Chi-squared test with Yates' continuity correction
# data:  matrix(as.numeric(main[1, 2:5]), nrow = 2, 2, 2)
# X-squared = 9.7656, df = 1, p-value = 0.001778

Now, if you want to add the results to each row, you first need to pick "which results". Namely, the results are actually prettied up a bit, with several tidbits internally:

str(unclass(res))
# List of 9
#  $ statistic: Named num 9.77
#   ..- attr(*, "names")= chr "X-squared"
#  $ parameter: Named int 1
#   ..- attr(*, "names")= chr "df"
#  $ p.value  : num 0.00178
#  $ method   : chr "Pearson's Chi-squared test with Yates' continuity correction"
#  $ data.name: chr "matrix(as.numeric(main[1, 2:5]), nrow = 2, 2, 2)"
#  $ observed : num [1:2, 1:2] 20 10 40 80
#  $ expected : num [1:2, 1:2] 12 18 48 72
#  $ residuals: num [1:2, 1:2] 2.309 -1.886 -1.155 0.943
#  $ stdres   : num [1:2, 1:2] 3.33 -3.33 -3.33 3.33

If you wanted to include (e.g.) the test statistic as a number, you might do:

chisq.statistic <- sapply(seq_len(nrow(main)), function(row) {
  chisq.test(matrix(as.numeric(main[row,2:5]), nrow=2, 2,2))$statistic
})
main$chisq.statistic <- chisq.statistic
main
#    Genes Group1_Mut Group1_WT Group2_Mut Group2_WT chisq.statistic
# 1 GENE_A         20        40         10        80      9.76562500
# 2 GENE_B         10        50         30        60      4.29687500
# 3 GENE_C          5        55         10        80      0.07716049

Note that tools like dplyr and data.table may facilitate this. For example:

library(dplyr)
main %>%
  rowwise() %>%
  mutate(
    chisq.statistic = chisq.test(matrix(c(Group1_Mut, Group1_WT, Group2_Mut, Group2_WT), nrow = 2))$statistic
  )
# Source: local data frame [3 x 6]
# Groups: <by row>
# # A tibble: 3 × 6
#    Genes Group1_Mut Group1_WT Group2_Mut Group2_WT chisq.statistic
#   <fctr>      <dbl>     <dbl>      <dbl>     <dbl>           <dbl>
# 1 GENE_A         20        40         10        80      9.76562500
# 2 GENE_B         10        50         30        60      4.29687500
# 3 GENE_C          5        55         10        80      0.07716049

This example shows one thing you may wish to incorporate into whichever method you use: explicit naming of columns. That is, "2:5" could change depending on your input matrix.

r2evans
  • 141,215
  • 6
  • 77
  • 149
0

The oddity here is that you are not giving matrix a vector, you are giving it a data frame.

main[1,2:5]
  Group1_Mut Group1_WT Group2_Mut Group2_WT
1         20        40         10        80

And since every element in a matrix must have the same type, your matrix elements all end up being lists.

m <- matrix(main[1,2:5], nrow=2, byrow = TRUE)

class(m)
"matrix"
typeof(m)
"list"

class(m[1, 1])
"list"

You need to unlist your data frame elements before calling matrix

chisq.test(matrix(unlist(main[1, 2:5], nrow = 2, byrow = TRUE)))

This will yield what you desire.

Benjamin
  • 16,897
  • 6
  • 45
  • 65