1

I'm teaching a unit on simulation based inference (single proportion test) using the mosaic package in R, and I'm having trouble getting R to count the number of simulations in which the simulated proportion is greater than or equal to the observed data.

Specifically, I have a problem where 600 people are surveyed as to whether they plan to light fireworks on the Fourth of July, and 56% respond that they do. I want to test that this proportion is greater than a null hypothesis of p_0=0.5. Here's my current code:

library(mosaic)
set.seed(25)

# Set up a data frame of 600 responses, 56% of which are "Fireworks", remaining "No Fireworks"
FireworksData <- c(rep("Fireworks",600*.56),rep("No Fireworks",600*(0.44)))
Fireworks_df <- data.frame('FourthOfJulyPlans'=FireworksData)

# Simulation: Run 1000 resamples of the data, and calculate proportion that are "Fireworks"
Fireworks.Null <- do(1000)*(prop( ~ FourthOfJulyPlans, 
                                   data = resample(Fireworks_df),
                                   success="Fireworks")+(0.50-0.56))
# The +(0.50-0.56) at the end centers this distribution at p_0=0.5

#Count simulated proportions that meet or exceed 0.56, and calculate p-value
count(Fireworks.Null>=.56)
count(Fireworks.Null>=.56)/simulations

The problem is that count(Fireworks.Null>=.56) only finds the values that are greater than 0.56, but not equal: it returns that there are 2 cases that meet or exceed 0.56, but in fact there are 3 with this set.seed(). The third case is equal to 0.56. I have no idea why this is happening: when I run a simple test example, I get the correct answer:

testvector <- c(0.57,0.57,0.57,0.56,0.56,0.55,0.55)
count(testvector>=0.56)

returns 5 as expected.

Any help is much appreciated!

CarneSeca
  • 61
  • 4
  • the value is actually `"0.55999999999999994227"` if you print enough digits – dww Apr 10 '21 at 23:00
  • thanks, I'm reading up on floating pt arithmetic now. Based on the way the data was created, I believe the "correct" answer should be 0.56 (when resample selects "Fireworks" 336 times), and that this really is a floating point error. I see the all.equal() function exists to replace ==, is there a version of this for >=? – CarneSeca Apr 10 '21 at 23:05
  • you can use `x > y | isTRUE(all.equal(x,y))` or `(y - x) > .Machine$double.eps^0.5` – dww Apr 10 '21 at 23:06
  • That looks like it should work, but `count(Fireworks.Null > .56 | isTRUE(all.equal(Fireworks.Null,0.56)))` still reports 2, and just looking for equality with `count(isTRUE(all.equal(Fireworks.Null,0.56)))` reports 0. Using `round(Fireworks.Null,digits=8)` gets around the floating pt error, but that feels like cheating. – CarneSeca Apr 10 '21 at 23:15

0 Answers0