Finding p value of histogram data in R

Question

Based on the proportion of slopes from the randomisation, greater or less than the slope from the observed data, I would like to calculate the expected probability of getting the observed slope. The observed slope is -0.2717.

Any help would be greatly appreciated, I am a newbie.

histdata<- numeric(10000)
for (i in 1:10000) {histdata[i]<-(summary.lm(lm(sample(tcons)~tleave))
[[4]][[2]])}
hist(histdata)
abline(v=-0.2717, lwd=3, lty=2)
box()

data3<- -0.2717>histdata

This ^^ gives me 9954 that are not greater than the original and 46 that are greater.

How do you want to calculate this p-value? It's easier to help you when you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can actually run your code to see what it's doing. — MrFlick, Nov 20 '17 at 15:12
Turned histdata into input data: data2<- data.frame(histdata). It is slope values from a loop which I am trying to find the p value for. However, I can't run an anova on it cuurently. Not sure if I need to change the object — Barry Allen ' The Flash, Nov 20 '17 at 15:17
This comment doesn't make any sense to me. Still have no idea what you are doing. I don't understand the desired output. It's unclear what these different code chunks have to do with your goal. Are you trying to create one plot or multiple plots? — MrFlick, Nov 20 '17 at 15:32
Just looking to find the p value of "histdata" which contains 10000 slope values from a for-loop. Then plot this along with the mean of the 10000 slopes on a single dimension plot. — Barry Allen ' The Flash, Nov 20 '17 at 15:54
That doesn't make sense statistically. How do you calculate the pvalue of 10000 numbers? What's the p-value of 1, 7, 12? In order to have a p-value there needs to be some model (distributional assumption) and some test statistic. Some hypothesis to test. And a p-value is doing to be on a completely different scale than the observations themselves so how you include them on the same plot isn't at all clear. — MrFlick, Nov 20 '17 at 16:04
My mistake. Based on the proportion of slopes from the randomisation, greater or less than the slope from the observed data, I would like to calculate the expected probability of getting the observed slope. The observed slope is -0.2717. I have edited the post to reflect this. — Barry Allen ' The Flash, Nov 20 '17 at 16:21

score 0 · Accepted Answer · answered Nov 20 '17 at 19:37

If you have the results of a randomization procedure in rand_vals and an observed value in obs_val, then the one-tailed p-value (quantifying support for the null hypothesis vs. the alternative hypothesis that the observed value is greater than the null value) is

mean(rand_vals>=obs)

Note that this is NOT ☢☣ (can't find a skull & crossbones emoji) the "probability of getting the observed slope". It is *the probability of observing a value greater than or equal to the observed slope, if the null hypothesis is true.
In some cases it may be appropriate to include the observed value in the "randomization" set as well, i.e. mean(c(rand_vals,obs)>=obs); this won't make much difference if your randomization set is large.
a two-tailed p-value would be something like mean(abs(rand_vals)>=abs(obs))

Thank you!! Any chance you know how I would find the 95% confidence interval for this result — Barry Allen ' The Flash, Nov 20 '17 at 23:27
Of the slopes sorry for the confusion. I believe I have to fit the model but unsure. — Barry Allen ' The Flash, Nov 20 '17 at 23:30

Finding p value of histogram data in R

1 Answers1