0

I am trying to run binom.test on a data.table with both the X and N values provided for each row. I saw this post, which uses a static N value and tried to modify, but if I try I get:

dt = data.table(X=rbinom(100, 625, 1/5), N=rbinom(100, 625, 4/5))
dt[, P := binom.test(x=X, n=N)$p.value ]
# Error in binom.test(x = X, n = N) : incorrect length of 'x'

The post also mentions aggregating by=X, but even still I get:

dt[, P := binom.test(x=X, n=N)$p.value, by=X ]
# Error in binom.test(x = X, n = N) : 'n' must be a positive integer >= 'x'

Despite N always being a positive integer greater than X. My goal is not to group by values of X though, I want a binom.test p-value for every row.

dragon951
  • 341
  • 1
  • 8

2 Answers2

1

We could group by every row and apply binom.test on it.

library(data.table)

dt[, P := binom.test(x=X, n=N)$p.value, seq_len(nrow(dt))]
#which is same as
#dt[, P := binom.test(x=X, n=N)$p.value, 1:nrow(dt)]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Do you have a good link to explain why data.table requires the group by operation to do the row operations? I couldn't find anything at rdatatable.gitlab.io – dragon951 Feb 18 '20 at 18:18
  • @dragon951 As we want `p.value` for every row here, we group them by row. I am not aware if this use case is explained anywhere. – Ronak Shah Feb 18 '20 at 23:39
1

We can use Map to loop over each of the corresponding elements of 'X' and 'N'

library(data.table)
dt[,  P := unlist(Map(function(x, y) binom.test(x = x, n = y)$p.value,  X, N))]
head(dt)
#     X   N            P
#1: 104 510 3.737474e-43
#2: 137 501 8.640380e-25
#3: 140 517 3.982312e-26
#4: 131 498 6.476382e-27
#5: 114 506 1.000591e-36
#6: 120 507 8.940756e-34

Or without anonymous function call

dt[, P := sapply(Map(binom.test, x = X, n = N), `[[`, "p.value")]
akrun
  • 874,273
  • 37
  • 540
  • 662