How to calculate for each element in a vector the fraction of elements in another vector that is smaller?

Question

n<-100000   
aa<-rnorm(n)
bb<-rnorm(n)
system.time(lapply(aa, function(z){mean(bb<pnorm(z))}))

It takes too long to run this small code. Simply put, I have two vectors aa and bb. For each element of aa, say aa[i], I want the proportion of bb < aa[i]

I found this article and tried to use it to speed up. But it does not work. Speed comparison of sapply with a composite function

Any help will be appreciated!

Just a minor comment: Why not create `pnorm(z)` outside the function? That is, `aa <- pnorm(rnorm(n))`. — Bernd Weiss, May 19 '11 at 01:04

score 7 · Answer 1 · answered May 19 '11 at 02:06

7

You may be able to use the findInterval function:

n <- 25000
aa <- rnorm(n)
bb <- rnorm(n)
system.time(q1 <- lapply(aa, function(z){mean(bb<pnorm(z))}))
#   user  system elapsed
# 20.057   2.544  22.807
system.time(q2 <- findInterval(pnorm(aa), sort(bb))/n)
#   user  system elapsed
#  0.020   0.000   0.021
all.equal(as.vector(q1, "numeric"), q2)
# [1] TRUE

Note that findInterval returns indices, so I've divided the result by n. If you can sort pnorm(aa) before giving it to findInterval, it will be even faster.

answered May 19 '11 at 02:06

Andy

4,789
23
20

1

Fantastic! I had never encountered the findInterval function before. – Ian Fellows May 19 '11 at 03:07
3

@Ian What reminds me about http://unknownr.r-forge.r-project.org/. From author description: "Do you know how many functions there are in base R? How many of them do you know you don't know? Run `unk()` to discover your unknown unknowns. It's fast and it's fun!" – Marek May 19 '11 at 08:22

score 1 · Answer 2 · answered May 19 '11 at 00:39

I'm not meaning to be facetious but these are the types of problems that R is designed to solve without having to do every single calculation - ie, use statistics!

Assuming that the distributions are normal...

aa.new <- sample(aa, 1000)
bb.new <- sample(bb, 1000)

x <- lapply(aa.new, function(z){mean(bb.new<pnorm(z))})
x <- unlist(x)

mean(x)

You can be 99% certain that the proportion of bb < aa[i] falls between +/- 4% of mean(x).

For simple random sampling, 99% margin of error = 1.29/sqrt(n)

score 1 · Answer 3 · answered May 19 '11 at 03:58

If you only want the proportion ' < aa[i]' then you should just determine the number of bb less than than each value of aa and then divide by length:

bbs <- sort(bb)
zz <- findInterval(aa, bbs)
zz <- zz/length(aa)

It does what you say you want, while your code I fear does not.

How to calculate for each element in a vector the fraction of elements in another vector that is smaller?

3 Answers3