To see why the error is trying to communicate, compare your data with the type of data chisq.test
is expecting:
dput(matrix(main[1,2:5,drop=T], nrow=2, 2,2))
# structure(list(20, 10, 40, 80), .Dim = c(2L, 2L))
dput(matrix(1:4, nrow=2, 2,2))
# structure(c(1L, 3L, 2L, 4L), .Dim = c(2L, 2L))
One remedy is to force you data into a numeric
vector:
res <- chisq.test(matrix(as.numeric(main[1,2:5]), nrow=2, 2,2))
res
# Pearson's Chi-squared test with Yates' continuity correction
# data: matrix(as.numeric(main[1, 2:5]), nrow = 2, 2, 2)
# X-squared = 9.7656, df = 1, p-value = 0.001778
Now, if you want to add the results to each row, you first need to pick "which results". Namely, the results are actually prettied up a bit, with several tidbits internally:
str(unclass(res))
# List of 9
# $ statistic: Named num 9.77
# ..- attr(*, "names")= chr "X-squared"
# $ parameter: Named int 1
# ..- attr(*, "names")= chr "df"
# $ p.value : num 0.00178
# $ method : chr "Pearson's Chi-squared test with Yates' continuity correction"
# $ data.name: chr "matrix(as.numeric(main[1, 2:5]), nrow = 2, 2, 2)"
# $ observed : num [1:2, 1:2] 20 10 40 80
# $ expected : num [1:2, 1:2] 12 18 48 72
# $ residuals: num [1:2, 1:2] 2.309 -1.886 -1.155 0.943
# $ stdres : num [1:2, 1:2] 3.33 -3.33 -3.33 3.33
If you wanted to include (e.g.) the test statistic as a number, you might do:
chisq.statistic <- sapply(seq_len(nrow(main)), function(row) {
chisq.test(matrix(as.numeric(main[row,2:5]), nrow=2, 2,2))$statistic
})
main$chisq.statistic <- chisq.statistic
main
# Genes Group1_Mut Group1_WT Group2_Mut Group2_WT chisq.statistic
# 1 GENE_A 20 40 10 80 9.76562500
# 2 GENE_B 10 50 30 60 4.29687500
# 3 GENE_C 5 55 10 80 0.07716049
Note that tools like dplyr
and data.table
may facilitate this. For example:
library(dplyr)
main %>%
rowwise() %>%
mutate(
chisq.statistic = chisq.test(matrix(c(Group1_Mut, Group1_WT, Group2_Mut, Group2_WT), nrow = 2))$statistic
)
# Source: local data frame [3 x 6]
# Groups: <by row>
# # A tibble: 3 × 6
# Genes Group1_Mut Group1_WT Group2_Mut Group2_WT chisq.statistic
# <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 GENE_A 20 40 10 80 9.76562500
# 2 GENE_B 10 50 30 60 4.29687500
# 3 GENE_C 5 55 10 80 0.07716049
This example shows one thing you may wish to incorporate into whichever method you use: explicit naming of columns. That is, "2:5" could change depending on your input matrix.