0

I am wanting to do a 2 sample proportions test in R using a loop and split by Health Center and by Measure. Below is a link to show an example of how my data is set up (The website would not allow me to upload an image of my dataset)

Basically, I want to compare the Health Center A's with Measure A's using the prop.test function and repeat this for all my health centers (29 of them) and measures (14 of them). I am just not sure on what the code would be to loop this so it does all the proportion tests that I want and split it how I want.

Any help would be greatly appreciated!

I went ahead a deleted all the 0's. However, my code worked but did not split the data how I wanted it to. I was hoping it would split by health center and then by measure but instead it splits just by health center so I have 28 measures the prop.test is analyzing instead of just health center A with Measure A. See output example below: Click here to see output

Community
  • 1
  • 1
Jessica W
  • 13
  • 5
  • 2
    Please don't include pictures of data; include a proper [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with data that can by copy/pasted into R. Also provide the desired output. This will make it easier to help you and possible to verify potential solutions. – MrFlick Sep 11 '17 at 20:07

1 Answers1

1

You can perform your split and prop.test using this

lapply(split(df, df$Health_Center), function(x) prop.test(as.matrix(cbind(x[,3], x[,4]-x[,3]))))

Output

$A1

        2-sample test for equality of proportions with continuity correction

data:  as.matrix(cbind(x[, 3], x[, 4] - x[, 3]))
X-squared = 1.713, df = 1, p-value = 0.1906
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.28136975  0.04846377
sample estimates:
   prop 1    prop 2 
0.2307692 0.3472222 


$A2

        2-sample test for equality of proportions with continuity correction

data:  as.matrix(cbind(x[, 3], x[, 4] - x[, 3]))
X-squared = 0.12192, df = 1, p-value = 0.727
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.1789668  0.1114439
sample estimates:
   prop 1    prop 2 
0.4800000 0.5137615 

Input Data

df <- data.frame(Health_Center=c("A1","A2","A1","A2"),
                 Measure=c("A","B","A","B"),
                 Numerator=c(15,48,25,56),
                 Denominator=c(65,100,72,109), stringsAsFactors=F)
CPak
  • 13,260
  • 3
  • 30
  • 48
  • Question - where are the proportions coming from? How are they calculated? I was thinking it would be 15/65 = 0.23 and 25/72 = 0.35. – Jessica W Sep 12 '17 at 14:40
  • Ah, normally these data are stored as `response 1 - response 2` and not `response 1 - total response`. I'll try to update my answer – CPak Sep 12 '17 at 16:36
  • I've updated my answer. You should see the proportions you expect now. – CPak Sep 12 '17 at 17:10
  • Thank you! Can you explain the last part of the code? What is this part doing: " as.matrix(cbind(x[,3], x[,4] - x[,3])) ?" I am trying to run it with my full data set and I keep getting an "argument n is missing, with no default" error. – Jessica W Sep 12 '17 at 17:49
  • The breakdown: `x[,3]` is 3rd column of the data (that is, response 1). `x[,4]` is 4th column of the data (that is, total response). `[x,4] - x[,3]` is response 2 (that is, total - response 1). `cbind(x[,3], x[,4]-x[,3]` makes a 2 column array/matrix of response 1 and response 2. `as.matrix(...)` makes sure it's a matrix. Hope that helps. – CPak Sep 12 '17 at 18:30
  • It could be that `prop.test` is tripping on `0, 0` rows. Try filtering your data first with `newdf <- df[df[,3]==0,]` – CPak Sep 12 '17 at 19:02
  • I added an image to my first question to show you what my actual data looks like. Here is the code I am currently using and I am messing up somewhere: lapply(split(Jdata, list(Jdata$Center, Jdata$Measure)), function(x) prop.test(as.matrix(cbind(x[,3] x[,4]-x[,3])), p = NULL, alternative = c("two.sided", "less", "greater"), conf.level = 0.95, correct = FALSE)) – Jessica W Sep 12 '17 at 19:03
  • You're splitting your data on `Center` and `Measure`. Try splitting only on `Center` – CPak Sep 12 '17 at 19:05
  • This is the error i get "$ operator is invalid for atomic vectors" – Jessica W Sep 12 '17 at 19:11
  • Show output of `dput(Jdata)` – CPak Sep 12 '17 at 19:12
  • Or I get an error that says not enough data. – Jessica W Sep 12 '17 at 19:13
  • You'll need to tidy your data so that it's compatible with `prop.test` – CPak Sep 12 '17 at 19:13
  • I posted an image of what the output gave me when I asked the first question. – Jessica W Sep 12 '17 at 19:25
  • Do you have any suggestions as to what my data should look like for prop.test to run correctly? – Jessica W Sep 12 '17 at 19:26
  • It should look like `df`. 4 column data frame with numeric in 3rd and 4th column > 0 – CPak Sep 12 '17 at 19:27
  • I put an example of what I would like my output to be with my first question. I was hoping it would split by health center and then by measure so it was only analyzing two numerators and two denominators at a time. – Jessica W Sep 12 '17 at 19:44
  • I figured it out! Thank you so much for all your help! :) – Jessica W Sep 12 '17 at 20:15
  • Great! Consider accepting my answer (check mark) to the left, which lets the community know this answer worked for you – CPak Sep 12 '17 at 20:16
  • Sounds good! Will do! – Jessica W Sep 12 '17 at 20:35