1

I want to calculate the integral of a sequence vector. Since there's no function available, I use the trapezoidal method1.

iglTzm <- function(x, y) sum(diff(x) * (head(y, -1) + tail(y, -1))) / 2

The first element of the sequence should be the zero point, so the principle is: if the values of the sequence are predominantly below the first value, the integral should be negative, otherwise positive, or 0.

Consider matrix m1:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    6    7    8    8    6    8   10
[2,]    9    9    8    9    9    8    9
[3,]    9   10   10    9    9    9    9
[4,]    9    8    8    8    6    8    9
[5,]   10   10   10    9   10    8    0
[6,]    9    8    9   10    9    9    9

Integration with these raw values will most likely lead to inconsistent values:

> setNames(apply(m1, 1, iglTzm, 0:6), 1:6)
  1   2   3   4   5   6 
 15   2  -2   7 -52   0 

So I adjust the sequences (rows) on their first value (column 1), in order to set the right signs, and get matrix m2:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0    1    2    2    0    2    4
[2,]    0    0   -1    0    0   -1    0
[3,]    0    1    1    0    0    0    0
[4,]    0   -1   -1   -1   -3   -1    0
[5,]    0    0    0   -1    0   -2  -10
[6,]    0   -1    0    1    0    0    0

Logically that doesn't change anything about the values iglTzm() throws, because the diff() is the same:

> setNames(apply(m2, 1, iglTzm, 0:6), 1:6)
  1   2   3   4   5   6 
 15   2  -2   7 -52   0 

Anyway, because I can't simply scale or invert it, I haven't had a brilliant idea yet how to adapt the function to get the right signs, which are assumingly:

#  1   2   3   4   5   6 
# 15  -2   2  -7 -52   0

Does anyone know how to adapt iglTzm() to get the integrals with the correct sign?

The plot of m2 should illustrate the principle a bit more:

enter image description here


data

m1 <- matrix(c(6, 7, 8, 8, 6, 8, 10,
                9, 9, 8, 9, 9, 8, 9,
                9, 10, 10, 9, 9, 9, 9,
                9, 8, 8, 8, 6, 8, 9, 
                10, 10, 10, 9, 10, 8, 0, 
                9, 8, 9, 10, 9, 9, 9), 6, byrow=TRUE)

m2 <- t(apply(m1, 1, function(x) scale(x, center=x[1], scale=FALSE)))

# plot
par(mfrow=c(2, 3))
lapply(1:nrow(m2), function(x) {
  plot(m2[x, ], type="l", main=x)
  abline(h=m2[x, 1], col="red", lty=2)
})
Community
  • 1
  • 1
jay.sf
  • 60,139
  • 8
  • 53
  • 110

1 Answers1

1

Firstly, there is another small but more important issue, although after fixing it your question would still remain valid. What I mean is that the order of x and y as arguments of your function should be reversed due to how you use the function in apply.

But that is not enough and now we come back to your question. For that, let us recall the usual integration: ʃf(x)dx (with limits from a to b) would integrate the area below f, which is what your function already successfully does. Now what you want is to adjust its level. But if we integrate from a to b, that is the same as ʃ(f(x)-f(a))dx = ʃf(x)dx - (b-a)f(a), which leads to

iglTzm <- function(y, x) sum(diff(x) * (head(y, -1) + tail(y, -1))) / 2 - y[1] * (max(x) - min(x))
setNames(apply(m1, 1, iglTzm, 0:6), 1:6)
#  1  2  3  4  5  6 
#  9 -2  2 -7 -8  0 

It happens that only two absolute values are different from the version where x and y are reversed. Bet let's look at the first function: should it be 9 or 15? We have that 2*2/2 + 1*2 + 1*2/2 + 2*4/2 = 9, so indeed we want to reverse x and y.

Another way to write the function would be

iglTzm <- function(y, x) sum(diff(x) * (head(y - y[1], -1) + tail(y - y[1], -1))) / 2
setNames(apply(m1, 1, iglTzm, 0:6), 1:6)
#  1  2  3  4  5  6 
#  9 -2  2 -7 -8  0 

Edit: by reversing I only meant the order in the function definition or how you use it in apply; the function itself in terms of y (function values) and x (grid values) is fine.

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • Hola! I have noticed the strange values but i have not really realized the error. But I believe the order now is better and can be intuitively used with `apply` or other functions with similar architecture. I'm sure the mix-up happened because I was using `integrate()` before. And now - regardless to your already great answer - in this point I have an intuition problem: `integrate(function(x) x, 0, 1)` and `iglTzm(0:1, 0:1)`yield the same, whereas `iglTzm(0:10, 0:1)` is different, but shouldn't it be equal? We integrate in all cases from 0 to 1, and all slopes are equal. Hope this makes sense. – jay.sf Jan 11 '19 at 19:28
  • @jay.sf, `iglTzm(0:10, 0:1)` is wrong (due to recycling you get no warnings/errors). You have 11 function values 0, 1, ..., 10, whereas you give only two grid points 0, 1. So where does the function take value 3, 6, 10? Now `iglTzm(0:10, seq(0, 1, length = 11))` would be correct integration from 0 to 1, but the slopes aren't the same. The slope of 0:1 is 1, while the slope of 0:10, is 10. (Due to recycling, `iglTzm(0:10, 0:1)` is the same as `iglTzm(0:10, 0:10)`.) – Julius Vainora Jan 11 '19 at 19:33
  • Ah recycling! Ok, but when I `plot(0:1)` and `plot(0:10)` both slopes should be the same, and also both integrals from 0 to 1 are the same, aren't they? Maybe I'm lacking something. – jay.sf Jan 11 '19 at 19:48
  • 1
    @jay.sf, `plot(0:1)` corresponds to f(x)=x for the range of `x` as [0,1], while `plot(0:10)` the same function with the range [0,10]. The integral of the first one then is 1*1/2=1, while the second one 10*10/2=50 (they are just two triangles). There seems to be some confusion between `x` and `y` that you pass to `iglTzm`, `x` and `y` in terms of f(x)=y as actual functions, and possible what a slope is (e.g., by rescaling the plot the slope may look different, but it isn't). – Julius Vainora Jan 11 '19 at 20:08
  • Yeah, I think your words cut the knot in my head. I really appreciate your help, thanks! – jay.sf Jan 11 '19 at 20:25
  • @jay.sf, no problem, I'm glad it helps. (I also have a typo: 1*1/2=1/2, not 1.) – Julius Vainora Jan 11 '19 at 20:32
  • I've noticed that ;) – jay.sf Jan 11 '19 at 20:35