1

I'm trying to calculate the area below a certain point, and unsure how to do that. I've seen this question, but it's not exactly answering what I'm looking for.

Here is some example data...

test_df <- structure(list(time = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23), balance = c(27, 
-45, -118, -190, -263, -343, -424, -1024, -434, -533, -613, -694, 
-775, -355, -436, -516, -597, -77, -158, -239, -319, -400, -472, 
-545)), row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"
)) %>% as_tibble()

ggplot(test_df, aes(time, balance))+
  geom_smooth(se = F)+
  geom_hline(yintercept = -400)

I'd like to calculate the AUC for the trend line, but only for when it is below a certain threshold (-400, for example).

So I can extract the values for the smoothed line...

test_plot <- ggplot(test_df, aes(time, balance))+
  geom_smooth(se = F)+
  geom_hline(yintercept = -400)

ggp_data <- ggplot_build(test_plot)$data[[1]]

and use something like this to get an AUC value

MESS::auc(ggp_data$x, ggp_data$y)

My questions are:

  1. How to only calculate below -400?
  2. How to interpret the value?
  3. What units would it be in?
  4. If my x axis is in hours, is there a way to turn the value into an hour value?

Thanks!

Jeff
  • 57
  • 6

1 Answers1

1

To calculate the area only below a certain threshold you can add the threshold to your y-values if your threshold is below 0 and subtract if your threshold is larger than 0. For your case that would be like this:

MESS::auc(ggp_data$x, ggp_data$y+400)

However, this calculates the AUC from 0 to 23 and therefore, also parts that are above -400. To get the AUC for the part that is below your threshold you have to find the x-values of the intersection between your smoothed line and the h-line at -400. Inspecting your values by eye you could find the following approximation of these x-values that fulfill this criteria:

 x1 <- 4.45 
 x2 <- 15.45 
 x3 <- 21.35

Now we have to calculate the AUC between x1 and x2, and x3 and max(x). Then we have to add these values together:

AUC1 <- MESS::auc(ggp_data$x, ggp_data$y+400, from = x1, to = x2)
AUC2 <- MESS::auc(ggp_data$x, ggp_data$y+400, from = x3, to = max(ggp_data$x))

AUC.total <- AUC1 + AUC2

> AUC.total
[1] -1747.352

Note that the value is negative because it is below 0. There are now "negative areas" therefore, you can take the absolute value AUC.total = 1747.352 to proceede. However, without information on your y-axis one cannot clearly interpret this value.

Noah
  • 440
  • 2
  • 9
  • That's amazing, thank you! Regarding the y-axis, is that the units of AUC? I'm hoping to get a number in units of the x-axis. For example, the number of total minutes below 400. – Jeff Aug 12 '22 at 07:19
  • 1
    If you just want the total minutes your value is below 400 you could just calculate it via x2-x1 and max(x) - x3. As your x-axis is in minutes so are the difference between these values. From inspecting the graph we know that for the time between x1 and x2 and as well between x3 and max(x) the value is below 400. Therefore, the sum of these subtractions equals the total time below 400. – Noah Aug 12 '22 at 07:24
  • Thanks Noah. For auc, should this also work? (seems pretty close) filtered <- ggp_data %>% select(x,y) %>% filter(y < -400) MESS::auc(filtered$x, filtered$y+400) – Jeff Aug 12 '22 at 08:30