80

I have one list of 100 numbers as height for Y axis, and as length for X axis: 1 to 100 with a constant step of 5. I need to calculate the Area that it is included by the curve of the (x,y) points, and the X axis, using rectangles and Scipy. Do I have to find the function of this curve? or not? ... almost all the examples I have read are about a specific equation for the Y axis. In my case there is no equation, just data from a list. The classic solution is to add or the Y points and multiple by the step X distance... using Scipy any idea?

Please, can anyone recommend any book which focusing on numerical (finite elementary) methods, using Scipy and Numpy? ...

nbrooks
  • 18,126
  • 5
  • 54
  • 66
user1640255
  • 1,224
  • 3
  • 19
  • 25

3 Answers3

93

The numpy and scipy libraries include the composite trapezoidal (numpy.trapz) and Simpson's (scipy.integrate.simpson) rules.

Here's a simple example. In both trapz and simpson, the argument dx=5 indicates that the spacing of the data along the x axis is 5 units.

import numpy as np
from scipy.integrate import simpson
from numpy import trapz


# The y values.  A numpy array is used here,
# but a python list could also be used.
y = np.array([5, 20, 4, 18, 19, 18, 7, 4])

# Compute the area using the composite trapezoidal rule.
area = trapz(y, dx=5)
print("area =", area)

# Compute the area using the composite Simpson's rule.
area = simpson(y, dx=5)
print("area =", area)

Output:

area = 452.5
area = 460.0
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • that's great! ... Both answers help me to understand and solve any questions I had. I would like to ask something relative... Do you recommend to use arrays and not list? is something that help the user ? or the logic and speed of the algorithm? – user1640255 Nov 10 '12 at 17:44
  • 1
    The first thing `trapz` and `simps` functions do is convert the `y` argument into a numpy array, so it doesn't really matter. You might look at your code that generates the `y` values, and see if that would benefit from the use of additional numpy or scipy functions. If so, `y` would already be an array when you passed it to `simps`. – Warren Weckesser Nov 10 '12 at 20:58
  • 3
    which one these two methods are more accurate? – Farid Alijani Nov 01 '19 at 06:51
  • Both are accurate – vashista Mar 16 '23 at 10:21
27

If you have sklearn installed, a simple alternative is to use sklearn.metrics.auc

This computes the area under the curve using the trapezoidal rule given arbitrary x, and y array

import numpy as np
from sklearn.metrics import auc

dx = 5
xx = np.arange(1,100,dx)
yy = np.arange(1,100,dx)

print('computed AUC using sklearn.metrics.auc: {}'.format(auc(xx,yy)))
print('computed AUC using np.trapz: {}'.format(np.trapz(yy, dx = dx)))

both output the same area: 4607.5

the advantage of sklearn.metrics.auc is that it can accept arbitrarily-spaced 'x' array, just make sure it is ascending otherwise the results will be incorrect

khuang834
  • 931
  • 1
  • 9
  • 12
24

You can use Simpsons rule or the Trapezium rule to calculate the area under a graph given a table of y-values at a regular interval.

Python script that calculates Simpsons rule:

def integrate(y_vals, h):
    i = 1
    total = y_vals[0] + y_vals[-1]
    for y in y_vals[1:-1]:
        if i % 2 == 0:
            total += 2 * y
        else:
            total += 4 * y
        i += 1
    return total * (h / 3.0)

h is the offset (or gap) between y values, and y_vals is an array of well, y values.

Example (In same file as above function):

y_values = [13, 45.3, 12, 1, 476, 0]
interval = 1.2
area = integrate(y_values, interval)
print("The area is", area)
Will Richardson
  • 7,780
  • 7
  • 42
  • 56
  • I'm not sure.. it could be really tricky finding the equation of a line, especially if you don't know the type of curve it is (exponential, parabola, etc) – Will Richardson Nov 10 '12 at 08:16
  • THANK you ... I really appreciate your help... just y_vals is array ? or my Y data list (H[i]) ? Is better to use arrays, not a list? do recommend to change my list to array? and about h, "h is the x-interval between y values" ? .. little help on this... on the wiki example say: """f=function, a=initial value, b=end value, n=number of intervals of size h, n must be even""" h = float(b - a) / n .. is the same h? so is the distance between each step? – user1640255 Nov 10 '12 at 08:43
  • Yes, `h` is the interval between each step. `y_vals` can be anything that can be iterated in a `for` loop. I just always use arrays because they are easy to use. – Will Richardson Nov 10 '12 at 08:48
  • ... so the y_vals can be list or array that defined in previous part of the algorithm ? in my case the list is defined as H.... do I have to insert a for loop for the def integrate? – user1640255 Nov 10 '12 at 09:02
  • 3
    What if the data is not equally spaced out? – CMCDragonkai Oct 11 '15 at 15:17
  • What if is a time series? I mean, x-axis is time instead of int values – vicemagui Aug 02 '18 at 00:28