0

I want to test dependency for many categorical variables using the Chi-Squared test implemented in R. In fact, I have 14 variables and it's very long to do 14*14 tests for all variables. As you know the Chi-Squared test is just concerned to do the test for tow variables in the normal case like this when I need to test the dependency between TYPE_PEAU and SENSIBILITE.

> library(MASS)
> tbl = table(DATA_BASE$TYPE_PEAU, DATA_BASE$SENSIBILITE)
> chisq.test(tbl)

    Pearson's Chi-squared test

data:  tbl
X-squared = 5727.5, df = 12, p-value < 2.2e-16

Assume that I have 14 variables, how do I deal with them?

This is the concerned dataset which contains categorical variables, hope that's helpful to resolve the problem

> dput(DATA_BASE[1:50,15:18])
structure(list(TYPE_PEAU = structure(c(3L, 4L, 5L, 1L, 3L, 1L, 
1L, 1L, 3L, 1L, 1L, 1L, 4L, 3L, 1L, 3L, 1L, 3L, 3L, 3L, 1L, 1L, 
1L, 3L, 1L, 1L, 3L, 1L, 3L, 5L, 1L, 5L, 2L, 1L, 5L, 5L, 3L, 1L, 
3L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 3L, 1L), .Label = c("", 
"Grasse", "Mixte", "Normale", "Sèche"), class = "factor"), SENSIBILITE = structure(c(4L, 
4L, 4L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 4L, 4L, 1L, 3L, 1L, 
3L, 3L, 4L, 1L, 1L, 1L, 2L, 1L, 1L, 4L, 1L, 2L, 3L, 1L, 4L, 4L, 
1L, 3L, 4L, 4L, 1L, 4L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, 1L, 4L, 
1L), .Label = c("", "Aucune", "Fréquente", "Occasionnelle"), class = "factor"), 
    IMPERFECTIONS = structure(c(3L, 4L, 3L, 1L, 2L, 1L, 1L, 1L, 
    4L, 1L, 1L, 1L, 3L, 3L, 1L, 2L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 
    4L, 1L, 1L, 3L, 1L, 3L, 2L, 1L, 4L, 3L, 1L, 3L, 3L, 3L, 1L, 
    2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 3L, 1L), .Label = c("", 
    "Fréquente", "Occasionnelle", "Rares"), class = "factor"), 
    BRILLANCE = structure(c(4L, 2L, 2L, 1L, 4L, 1L, 1L, 1L, 4L, 
    1L, 1L, 1L, 4L, 4L, 1L, 4L, 1L, 4L, 4L, 4L, 1L, 1L, 1L, 4L, 
    1L, 1L, 4L, 1L, 4L, 4L, 1L, 2L, 3L, 1L, 4L, 4L, 4L, 1L, 4L, 
    1L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, 1L, 4L, 1L), .Label = c("", 
    "Aucune", "Partout", "Zone T"), class = "factor")), .Names = c("TYPE_PEAU", 
"SENSIBILITE", "IMPERFECTIONS", "BRILLANCE"), row.names = c(15L, 
22L, 33L, 40L, 48L, 54L, 59L, 65L, 74L, 78L, 87L, 89L, 104L, 
108L, 115L, 121L, 141L, 159L, 161L, 163L, 165L, 175L, 179L, 186L, 
196L, 202L, 211L, 222L, 231L, 265L, 272L, 290L, 300L, 318L, 325L, 
327L, 349L, 372L, 374L, 380L, 392L, 393L, 394L, 398L, 427L, 440L, 
449L, 450L, 456L, 470L), class = "data.frame")

Thank you in advance

Rprogrammer
  • 457
  • 2
  • 6
  • 19
  • 1
    This doesn't seem to be a specific programming question that's appropriate for Stack Overflow. If you have general questions about the appropriate use of various statistical methods, then you should ask such questions over at [stats.se] instead. You are more likely to get better answers there. Or maybe clarify what you mean by "it's very long" and "how do I deal with them" – MrFlick May 03 '18 at 14:10
  • 1
    Check [this](https://stackoverflow.com/questions/37392655/chi-squared-test-of-independence-on-all-combinations-of-columns-in-a-dataframe-i) – A. Suliman May 03 '18 at 14:11
  • @A.Suliman, thank you very much! This is what I need – Rprogrammer May 03 '18 at 14:25

0 Answers0