2

I have two sets of data

I had plotted two probability density functions. Now I want the area between the two probability density functions, which are in certain x range.

I tried to integrate the area, trapezoidal rule etc:

Calculating the area between a curve and a straight line without finding the function

Error calculating the area between two lines using "integrate"

How to measure area between 2 distribution curves in R / ggplot2

but all are in vain.

Here is the link to the data i am working on.

https://sheet.zoho.com/sheet/editor.do?doc=1ff030ea1af35f06f8303927d7ea62b3c4b04bdae021555e8cc43ed0569cb2aaceb26368f93db4d15ac66cf7662d9a7873e889e1763139a49ffd68e7843e0b44

dens.pre=density(TX/10)
dens.post=density(TX30/10)`
plot(dens.pre,col="green")
lines(dens.post,col="red")

locator()
#$x
#[1] 18.36246

#$y
#[1] 0.05632428

abline(v=18.3,col="red")

Finding the area between the two curves for X > 18.3.

Area between the curves: enter image description here

  • Perhaps `area.between.curves()` in the `geiger` package could be of some assistance? – tomasu Mar 25 '19 at 12:50
  • Hi, thanks for the reply @ThomasJohnFlaherty I tried it, but its not working in R version 3.0.0+, it's for lower version of R. I installed the package and it seems not to load. this is what i get lines(dens.post,col="red") > area.between.curves(x, dens.pre, dens.post, xrange = c(18.3,35)) Error: could not find function "area.between.curves" – chanakya chan Mar 25 '19 at 13:04
  • Possible duplicate of [Calculate Area Between 2 Curves](https://stackoverflow.com/questions/7868382/calculate-area-between-2-curves) – Cettt Mar 25 '19 at 13:04
  • 1
    chanakya chan, did you try the package? It depends on R *at least* 2.15, but since it was last updated to CRAN two months ago, I find it hard to believe that it was accepted if it does not support R-3. And I cannot find where [in its source](https://github.com/cran/geiger/blob/87d863b5d43675ef036f7202955993594d681c2c/R/disparity.R) it does anything version-specific. – r2evans Mar 25 '19 at 13:17
  • Hi @r2evans I tried it again, but i got null result, but as you can obviously see its not a null value. Here is the code i tried again. (> library("geiger", lib.loc="~/R/win-library/3.3") Loading required package: ape > geiger:::.area.between.curves(18.3, dens.post, dens.pre) [1] 0) – chanakya chan Mar 25 '19 at 13:27
  • 2
    Just because you get an unexpected result does not mean the package is broken or does not support up-to-date versions of R. We cannot help you, unfortunately, since we don't have any sample data. (It really helps to have a fully-reproducible question, including *sample* data generated programmatically or provided with `dput`. For good-question references, please see https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info.) – r2evans Mar 25 '19 at 13:38
  • Hi @r2evans, I uploaded and added the link to actual data I am working on, in the original post. Hope this helps me to get out of the woods. – chanakya chan Mar 25 '19 at 13:50

1 Answers1

1

With trapezoidal rule you could probably calculate it like this:

d0 <- dens.pre
d1 <- dens.post
f0 <- approxfun(d0$x, d0$y)
f1 <- approxfun(d1$x, d1$y)

# defining x range of the density overlap
ovrng <- c(18.3, min(max(d0$x), max(d1$x)))

# dividing it to sections (for example n=500)
i <- seq(min(ovrng), max(ovrng), length.out=500)

# calculating the distance between the density curves
h1 <- f0(i)-f1(i)
h2 <- f1(i)-f0(i)

#and using the formula for the area of a trapezoid we add up the areas
area1<-sum( (h1[-1]+h1[-length(h1)]) /2 *diff(i) *(h1[-1]>=0+0))     # for the regions where d1>d0
area2<-sum( (h2[-1]+h2[-length(h2)]) /2 *diff(i) *(h2[-1]>=0+0))     # for the regions where d1<d0
area_total <- area1 + area2 
area_total

Though, since you are interested only in the area where one curve remain below the other for the whole range, this can be shortened:

d0 <- dens.pre
d1 <- dens.post
f0 <- approxfun(d0$x, d0$y)
f1 <- approxfun(d1$x, d1$y)

# defining x range of the density overlap
ovrng <- c(18.3, min(max(d0$x), max(d1$x)))

# dividing it to sections (for example n=500)
i <- seq(min(ovrng), max(ovrng), length.out=500)

# calculating the distance between the density curves
h1 <- f1(i)-f0(i)

#and using the formula for the area of a trapezoid we add up the areas where d1>d0
area<-sum( (h1[-1]+h1[-length(h1)]) /2 *diff(i) *(h1[-1]>=0+0))     
area

#We can plot the region using
plot(d0, main="d0=black, d1=green")
lines(d1, col="green")
jj<-which(h>0 & seq_along(h) %% 5==0); j<-i[jj]; 
segments(j, f1(j), j, f1(j)-h[jj])

There are other (and more detailed) solutions here and here

Oka
  • 1,318
  • 6
  • 11
  • Hi, I tried the same procedure yesterday. I got the total area between the curves. But I need area between curves after an X value 18.3, which is the area between the curves after the redline as shown in the above fig. – chanakya chan Mar 25 '19 at 14:05
  • Hi, Thank you for the comment. I corrected the code so it will calculate after 18.3. – Oka Mar 25 '19 at 14:33
  • Thank you very much@-Oka – chanakya chan Mar 25 '19 at 14:35
  • @chanakya chan : Is it acceptable / sufficient solution? – Oka Mar 25 '19 at 14:36
  • Ok, I´ll check. It remains below 1 with my toy data, but I´ll check with your dataset – Oka Mar 25 '19 at 14:49
  • @– Oka It worked perfectly fine, thank you very much. It's my mistake in taking the code. The only thing remaining is shading the area we calculated. If you also add code to it, the whole problem would be solved :D, thank you very much. – chanakya chan Mar 25 '19 at 14:55
  • @chanakya chan No problem. I added the code to plot the area to the lower codeblock. – Oka Mar 25 '19 at 15:00