I have a survey dataframe containing several questions (columns) coded as 1=agree/0=disagree. Respondents (rows) are categorized according to metrics "age" ("young","middle","old"), "region" ("East","Mid","West"), etc. There are around 30 categories in total (3 ages, 3 regions, 2 genders, 11 occupations, etc.). Within each metric, categories are non-overlapping and of different sizes.
This simulates a cut-down version of the dataset:
n<-400
set.seed(1)
data<-data.frame(age=sample(c('young','middle','old'),n,replace=T),region=sample(c('East','Mid','West'),n,replace=T),gender=sample(c('M','F'),n,replace=T),Q15a=sample(c(0,1),n,replace=T),Q15b=sample(c(0,1),n,replace=T))
I can use Chi-square to test if the responses in, say, the West differ significantly from the total sample, for Q15a, with:
attach(data)
chisq.test(table(subset(data,region=='West')$Q15a),p=table(Q15a),rescale.p=T)
I want to test all categories against the total sample for Q15a, and then for ~20 other questions. As there are around 30 tests per question, I want to find a way (efficient or otherwise) to automate this, but am struggling to see how to get R to do this itself or how to write a loop to cycle through the categories. I've searched[1], and got sidetracked into pairwise comparison testing with pairwise.prop.test(), but haven't found anything that really answers this yet.
[1] similar but not duplicate questions (both are column-wise tests):