thank you for helping me. I know enough about R to be dangerous, but not enough to be good. I'm trying to create a chi square matrix (exactly like a correlation matrix, but with chi sq) from a dataframe. I've searched extensively (including this page: Chi-square p value matrix in r which didn't help) even trying a loop, and I'm not getting there. Here's what I have so far:
R Studio version 1.2.5033 on Windows 10
data - 14 columns of 3401 rows, I'm only using 9 columns for the chi sq, 8 variables are binary, 1 is categorical
I've gotten the chi sq results I need, but I want to combine it in a matrix. As it is a standard matrix where columns get progressively smaller, the columns are not the same length, so most bind commands won't work. I do have plyr, and I am trying to do it that way, but am failing. Here's the code:
CA <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 0)
Pos <- c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0)
Mon <- c(1, 1, 1, 1, 0, 0, 0, 0, 0, 1)
Sc <- c(0, 0, 0, 0, 1, 1, 0, 0, 1, 0)
ood <- c(1, 1, 1, 1, 0, 1, 1, 0, 0, 1)
Eco <- c(1, 2, 4, 6, 7, 3, 2, 5, 7, 7)
Orp <- c(0, 0, 0, 1, 1, 1, 1, 0, 0, 0)
BC <- c(1, 1, 1, 0, 0, 0, 0, 1, 0, 1)
SA <- c(0, 0, 1, 1, 1, 1, 1, 0, 0, 1)
MV <- c(1, 1, 0, 1, 1, 0, 1, 1, 0, 0)
Ad <- c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
YR <- c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0)
KC <- c(1, 1, 1, 1, 1, 1, 0, 1, 1, 1)
H <- c(1, 1, 1, 1, 1, 1, 1, 0, 0 ,1)
data <- data.frame(CA, Pos, Mon, Sc, ood, Eco, Orp, BC, SA, MV, Ad, YR, KC, H)
Or, here is the dput() data:
structure(list(CA = c(0, 0, 1, 1, 1, 1, 0, 1, 0), Pos = c(0,
0, 0, 0, 1, 1, 0, 1, 0), Mon = c(1, 0, 1, 1, 0, 1, 1, 1, 1),
Sc = c(0, NA, 0, 0, 1, 1, 1, 1, 0), ood = c(1, 1, 1, 0, 1,
0, 1, 1, 1), Eco = c(7, 6, 7, 0, 1, 6, 5, 1, 3), Orp = c(0,
0, 0, 0, 0, 0, 0, 0, 0), BC = c(1, 0, 1, 1, 1, 1, 1, 1, 1
), SA = c(0, 1, 0, 0, 0, 0, 0, 0, 0), MV = c(15, 13, 16,
12, 16, 14, 11, 18, 12), Ad = c(2, 2, 2, 3, 3, 2, 2, 2, 2
), YR = c(2, 2, 1, 1, 2, 2, 2, 1, 2), KC = c(2, 2, 1, 1,
1, 1, 1, 1, 1), H = c(0, 1, 1, 0, 0, 1, 1, 0, 1)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L), spec = structure(list(
cols = list(CA = structure(list(), class = c("collector_double",
"collector")), Pos = structure(list(), class = c("collector_double",
"collector")), Mon = structure(list(), class = c("collector_double",
"collector")), Sc = structure(list(), class = c("collector_double",
"collector")), ood = structure(list(), class = c("collector_double",
"collector")), Eco = structure(list(), class = c("collector_double",
"collector")), Orp = structure(list(), class = c("collector_double",
"collector")), BC = structure(list(), class = c("collector_double",
"collector")), SA = structure(list(), class = c("collector_double",
"collector")), MV = structure(list(), class = c("collector_double",
"collector")), Ad = structure(list(), class = c("collector_double",
"collector")), YR = structure(list(), class = c("collector_double",
"collector")), KC = structure(list(), class = c("collector_double",
"collector")), H = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
library(tidyverse)
library(plyr)
library(reshape2)
chisqstat <- function(x, y){round(chisq.test(x, y)$statistic, 2)} #chi sq formula, pulling stat
chisqp <- function(x, y){round(chisq.test(x, y)$p.value, 4)} #chi sq formula pulling pvalue
chisqdata <- list(apply(data[c(2:7, 9, 14)], 2, chisqstat, y = data$CA), #chi sq statistic for all values with child abuse
apply(data[c(2:7, 9, 14)], 2, chisqp, y = data$CA), #chi sq pvalue for all values with child abuse
apply(data[c(3:7, 9, 14)], 2, chisqstat, y = data$Pos),
apply(data[c(3:7, 9, 14)], 2, chisqp, y = data$Pos),
apply(data[c(4:7, 9, 14)], 2, chisqstat, y = data$Mon),
apply(data[c(4:7, 9, 14)], 2, chisqp, y = data$Mon),
apply(data[c(5:7, 9, 14)], 2, chisqstat, y = data$Sc),
apply(data[c(5:7, 9, 14)], 2, chisqp, y = data$Sc),
apply(data[c(6:7, 9, 14)], 2, chisqstat, y = data$ood),
apply(data[c(6:7, 9, 14)], 2, chisqp, y = data$ood),
apply(data[c(7, 9, 14)], 2, chisqstat, y = data$Eco),
apply(data[c(7, 9, 14)], 2, chisqp, y = data$Eco),
apply(data[c(9, 14)], 2, chisqstat, y = data$Orp),
apply(data[c(9, 14)], 2, chisqp, y = data$Orp),
apply(data[c(14)], 2, chisqstat, y = data$SA),
apply(data[c(14)], 2, chisqp, y = data$SA))
do.call(rbind.fill.matrix(), chisqdata)
When I do each line as:
as.matrix(apply(data[c(2:7, 9, 14], 2, chisqstat, y = data$CA))
I get exactly what I want with row names so I'm wondering if I need to use as.matrix. I know I can do a full matrix where the info above and below the diagonal is the same, but I find that visually overwhelming. I want to get something like this (this is correlations not chi square):
CA Pos Mon Sc ood Eco
Pos 0.20
Mon -0.20 0.60
Sc 0.22 -0.22 -0.65
ood -0.22 0.22 0.65 -0.52
Eco 0.00 -0.18 -0.18 0.38 -0.58
Orp 0.41 0.00 -0.41 0.36 0.09 0.04
I'm sure I'm missing something obvious, so your help is greatly appreciated. Or maybe there is a much nicer way to do this in general, again, thank you.