To use the correct test for independence

Question

I have two groups (data.frame) in R called good and bad which contain good users and bad users respectively.

The group good contains game_id which is the id for a computergame and number which is how many times this game has been played.

For example good$game_id we get 1 2 3 ... 20. We have 20 games. Similar good$number we get 45214 1254 23 ... 8914 which is the number the game has been played. For example has game_id==1 been played 45214 times in group good.

Similar for bad.
We also have the same number of users in the two groups.

So for head(good,20) we get

game_id  number
1  45214
2  1254
...
20  8914

I want to investigate if there is dependence between the number of times a fixed computergame has been played.

For game_id==1 I would try to use Pearson's Chi test for 'Independence'. In R I type chisq.test(good[1,2], bad[1,2]) to see if there is indepence between good and bad for game_id==1 but I get an error message: x and y must have same levels.

How can this problem be solved ?

Please read up on how to create a [reproducible exampe](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and update your question. — Thomas K, Sep 28 '15 at 14:13
What do you mean by "dependence between the number of times a fixed computergame has been played"? It seems like in your example you are trying to compare two numbers, and to find some sort of dependancy between them. I really don't think there is a statistical tool that somehow can do that. Pearson's Chi test can be applied to a set of numbers, but not to two individual numbers — Maksim Gayduk, Sep 28 '15 at 14:31
I want to investigate if some computergames has an influence on the good or bad group. For example can some games cause that some users are good and some games can cause that users are bad. — Ole Petersen, Sep 28 '15 at 14:37
I think what you're trying to investigate, but probably not expressing it correctly in your code is whether a specific game is more likely to be played by a good user or a bad user. Therefore, for each game you need to compare the percentage of good users who played vs. the percentage of bad users who played. You need to use the total number of good users and the total number of bad users (i think you mentioned this is the same). So, for game 1 the "good" percentage is 45214 / #total good users and the "bad" percentage is obtained in a similar way. — AntoniosK, Sep 28 '15 at 15:31
Yes that is what I would like to do. So just to be clear: For a fixed game_id I use Pearson chi-sq test: chisq.test(good percentage, bad percentage). For game_id 1 I get chisq.test(0.066,0.041) but R say "x and y must have at least 2 levels." — Ole Petersen, Sep 29 '15 at 07:06

To use the correct test for independence

0 Answers0