1

I want to test if there is a dependency between 2 qualitatives variables. Before using any test, I plot geom_bar().

Bar Chart

For me, this is quite evident that when the factor variable is equal to 1, the dependent variable is more often equal to 3 than when the factor variable is equal to 0. And when the factor variable is equal to 0, the dependent variable is more often equal to 2 than when the factor variable is equal to 1.

But if I perform chisq.test or fisher.test, I get a p-value equal superior to 0.3, which means that the two qualitatives variables are independent. But I don't really understand why the test are not significant. To perform the tests, I've used following code :

chisq.test(table(variable1,variable2))

where variable1 and variable2 are categorical variables

Thanks in advance for your help,

C

dcarlson
  • 10,936
  • 2
  • 15
  • 18
chlooo
  • 11
  • 2
  • 1
    We really need to see the data. A significant difference is based on the sample size so looking at a bar chart of percentages does not help. Use `dput(variable1)` and `dput(variable2)` and paste the results into your question as a code sample. – dcarlson May 11 '21 at 14:01

1 Answers1

1

Here's a detailed way:

#function borrowed from https://stackoverflow.com/a/32544987/4938484
#to maintain the right sum of entries when rounding
smart.round <- function(x) {
  y <- floor(x)
  indices <- tail(order(x-y), round(sum(x)) - sum(y))
  y[indices] <- y[indices] + 1
  y
}

N = 100 #change to appropriate sample size
tab <- matrix(c(8.1, 51.4, 40.5, 3.7, 37.0, 59.3), ncol=3, byrow=TRUE)
tab <- smart.round(tab/100 * N)
#values in tab were assigned from your bar chart
rownames(tab) <- c("0", "1")
colnames(tab) <- c("1", "2","3")
tab <- as.table(tab)
chisq.test(tab)
#which gives p-value = 0.03
Recap_Hessian
  • 368
  • 1
  • 10