1

I have a dataset of successes, probabilities, and sample sizes that I am running binomial tests on.

Here is a sample of the data (note that the actual dataset has me run >100 binomial tests):

km      n_1 prey_pred p0_prey_pred
 <fct> <dbl>     <int>        <dbl>
 80       93        12       0.119 
 81     1541       103       0.0793
 83      316         5       0.0364
 84      721        44       0.0796
 89      866        58       0.131 

I normally run this (example for first row):

n=93
p0=0.119
successes=12

binom.test(obs.successes, n, p0, "two.sided") 

>   Exact binomial test

data:  12 and 93
number of successes = 12, number of trials = 93, p-value = 0.74822
alternative hypothesis: true probability of success is not equal to 0.119
95 percent confidence interval:
 0.068487201 0.214548325
sample estimates:
probability of success 
            0.12903226 

Is there a way to systematically have it run multiple binomial tests on each row of data, and then storing all the output (p-value, confidence intervals, probability of success) as separate columns?

I've tried the solution proposed here, but I am clearly m

Blundering Ecologist
  • 1,199
  • 2
  • 14
  • 38
  • General outline: Write a function that takes the inputs you want as a vector. Have it output the values you want as a vector. Use apply to run that function on every row. – Dason May 20 '20 at 20:15

2 Answers2

1

You can define a function for this as suggested in the comments:

my_binom <- function(x, n, p){
res <- binom.test(x, n, p)
out <- data.frame(res$p.value, res$conf.int[1], res$conf.int[2], res$estimate)
names(out) <- c("p", "lower_ci", "upper_ci", "p_success")
rownames(out) <- NULL
return(out)
}

Then you can apply it for each row

do.call("rbind.data.frame", apply(df, 1, function(row_i){
my_binom(x= row_i["prey_pred"], n= row_i["n_1"], p= 
row_i["p0_prey_pred"])
}))
  • 1
    "and then storing all the output (p-value, confidence intervals, probability of success) as separate columns" - you're missing some of the desired output – Dason May 20 '20 at 20:21
1

Using apply.

res <- t(`colnames<-`(apply(dat, 1, FUN=function(x) {
  rr <- binom.test(x[3], x[2], x[4], "two.sided")
  with(rr, c(x, "2.5%"=conf.int[1], estimate=unname(estimate), 
             "97.5%"=conf.int[2], p.value=unname(p.value)))
}), dat$km))
res
#    km  n_1 prey_pred p0_prey_pred        2.5%   estimate      97.5%      p.value
# 80 80   93        12       0.1190 0.068487201 0.12903226 0.21454832 7.482160e-01
# 81 81 1541       103       0.0793 0.054881013 0.06683971 0.08047927 7.307921e-02
# 83 83  316         5       0.0364 0.005157062 0.01582278 0.03653685 4.960168e-02
# 84 84  721        44       0.0796 0.044688325 0.06102635 0.08106220 7.311463e-02
# 89 89  866        58       0.1310 0.051245893 0.06697460 0.08572304 1.656621e-09

Edit

If you have multiple column sets, in wide format (and for some reason want to stay there)

dat2 <- `colnames<-`(cbind(dat, dat[-1]), c("km", "n_1.1", "prey_pred.1", "p0_prey_pred.1", 
                                            "n_1.2", "prey_pred.2", "p0_prey_pred.2"))

dat2[1:3,]
#   km n_1.1 prey_pred.1 p0_prey_pred.1 n_1.2 prey_pred.2 p0_prey_pred.2
# 1 80    93          12         0.1190    93          12         0.1190
# 2 81  1541         103         0.0793  1541         103         0.0793
# 3 83   316           5         0.0364   316           5         0.0364

you may do:

res2 <- t(`colnames<-`(apply(dat2, 1, FUN=function(x) {
  rr1 <- binom.test(x[3], x[2], x[4], "two.sided")
  rr2 <- binom.test(x[6], x[5], x[7], "two.sided")
  rrr1 <- with(rr1, c("2.5%.1"=conf.int[1], estimate.1=unname(estimate), 
                      "97.5%.1"=conf.int[2], p.value.1=unname(p.value)))
  rrr2 <- with(rr2, c("2.5%.1"=conf.int[1], estimate.1=unname(estimate), 
                      "97.5%.1"=conf.int[2], p.value.1=unname(p.value)))
  c(x, rrr1, rrr2)
}), dat2$km))
res2
#    km n_1.1 prey_pred.1 p0_prey_pred.1 n_1.2 prey_pred.2 p0_prey_pred.2      2.5%.1
# 80 80    93          12         0.1190    93          12         0.1190 0.068487201
# 81 81  1541         103         0.0793  1541         103         0.0793 0.054881013
# 83 83   316           5         0.0364   316           5         0.0364 0.005157062
# 84 84   721          44         0.0796   721          44         0.0796 0.044688325
# 89 89   866          58         0.1310   866          58         0.1310 0.051245893
#    estimate.1    97.5%.1    p.value.1      2.5%.1 estimate.1    97.5%.1    p.value.1
# 80 0.12903226 0.21454832 7.482160e-01 0.068487201 0.12903226 0.21454832 7.482160e-01
# 81 0.06683971 0.08047927 7.307921e-02 0.054881013 0.06683971 0.08047927 7.307921e-02
# 83 0.01582278 0.03653685 4.960168e-02 0.005157062 0.01582278 0.03653685 4.960168e-02
# 84 0.06102635 0.08106220 7.311463e-02 0.044688325 0.06102635 0.08106220 7.311463e-02
# 89 0.06697460 0.08572304 1.656621e-09 0.051245893 0.06697460 0.08572304 1.656621e-09

One could code this more nested, but I recommend to keep things easy so later others understand better what's going on, and probably including oneself.


Data:

dat <- read.table(text="km      n_1 prey_pred p0_prey_pred
 80       93        12       0.119 
 81     1541       103       0.0793
 83      316         5       0.0364
 84      721        44       0.0796
 89      866        58       0.131 ", header=TRUE)
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • I notice that your solution drops all the original columns. Is there a way to keep all the original rows and add these new rows? – Blundering Ecologist May 20 '20 at 20:50
  • Do you mean you want to add the results to your original data frame to each row? – jay.sf May 20 '20 at 20:52
  • Yes, that is what I meant. :) – Blundering Ecologist May 20 '20 at 20:54
  • OK, we then just add `x` i.e. the original data into the `with` :) – jay.sf May 20 '20 at 20:58
  • I have an extra question (feel free to not answer because you already answered my original question). If I wanted to run multiple `binom.test()` for different "succeses" (i.e., even more columns) would I add this to above code like so: `...dat$km)),t(`colnames<-`(apply(dat, 1, function(x) { rr <- binom.test(x[6], x[5], x[7], "two.sided") with(rr, c(x, "2.5%_b"=conf.int[1], estimate_b=unname(estimate), "97.5%_b"=conf.int[2], p.value_b=unname(p.value))) }), dat$km))` – Blundering Ecologist May 20 '20 at 21:03
  • 1
    That sounds reasonable. However it also sounds like you have your data in wide format and could make things easier [reshaping it into long format](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) before calculations, using functions like `aggregate` or `by` that are designed to apply functions on subsets/strata. – jay.sf May 20 '20 at 21:11
  • I have it in wide format for a different reason, sadly. Might you be able to try the code I suggested above? When I add it to the code you suggested, it only keeps the second function instead of both. (I will make an edit to your code to show how I have it set up when the queue is done) – Blundering Ecologist May 20 '20 at 21:33
  • 1
    I understand, please see edit. The `apply` actually applies any `FUN=`ction row by row. Hence `x` is the whole row, to which we `c`oncatenate results `rr1` and `rr2`. Does that make sense for you? – jay.sf May 20 '20 at 21:46