1

I have a dataframe (gdata) with x (as "r") and y (as "km") coordinates of a function. When I plot it like this:

    plot(x = gdata$r, y = gdata$km, type = "l")

I get the graph of the function: enter image description here

Now I want to calculate the area under the curve from x = 0 to x = 0.6. When I look for appropriate packages I only find something like calculation AUC of a ROC curve. But is there a way just to calculate the AUC of a normal function?

Hashriama
  • 173
  • 11

2 Answers2

5

The area under the curve (AUC) of a given set of data points can be archived using numeric integration:

Let data be your data frame containing x and y values. You can get the area under the curve from lower x0=0 to upper x1=0.6 by integrating the function, which is linearly approximating your data.

This is a numeric approximation and not exact, because we do not have an infinite number of data points: For y=sqrt(x) we will get 0.3033 instead of true value of 0.3098. For 200 rows in data we'll get even better with auc=0.3096.

library(tidyverse)

data <-
  tibble(
  x = seq(0, 2, length.out = 20)
) %>%
  mutate(y = sqrt(x))
data
#> # A tibble: 20 × 2
#>        x     y
#>    <dbl> <dbl>
#>  1 0     0    
#>  2 0.105 0.324
#>  3 0.211 0.459
#>  4 0.316 0.562
#>  5 0.421 0.649
#>  6 0.526 0.725
#>  7 0.632 0.795
#>  8 0.737 0.858
#>  9 0.842 0.918
#> 10 0.947 0.973
#> 11 1.05  1.03 
#> 12 1.16  1.08 
#> 13 1.26  1.12 
#> 14 1.37  1.17 
#> 15 1.47  1.21 
#> 16 1.58  1.26 
#> 17 1.68  1.30 
#> 18 1.79  1.34 
#> 19 1.89  1.38 
#> 20 2     1.41

qplot(x, y, data = data)

integrate(approxfun(data$x, data$y), 0, 0.6)
#> 0.3033307 with absolute error < 8.8e-05

Created on 2021-10-03 by the reprex package (v2.0.1)

The absolute error returned by integrate is corerect, iff the real world between every two data points is a perfect linear interpolation, as we assumed.

danlooo
  • 10,067
  • 2
  • 8
  • 22
2

I used the package MESS to solve the problem:

# Toy example
library(MESS)
x <- seq(0,3, by=0.1)
y <- x^2
auc(x,y, from = 0.1, to = 2, type = "spline")

The analytical result is:

7999/3000

Which is approximately 2.666333333333333

The R script offered gives: 2.66632 using the spline approximation and 2.6695 using the linear approximation.

Diego
  • 328
  • 2
  • 9