8

Assume an n-dimensional array of observations that are reshaped to be a 2d-array with each row being one observation set. Using this reshape approach, np.polyfit can compute 2nd order fit coefficients for the entire ndarray (vectorized):

fit = np.polynomial.polynomialpolyfit(X, Y, 2)

where Y is shape (304000, 21) and X is a vector. This results in a (304000,3) array of coefficients, fit.

Using an iterator it is possible to call np.polyval(fit, X) for each row. This is inefficient when a vectorized approach may exist. Could the fit result be applied to the entire observation array without iterating? If so, how?

This is along the lines of this SO question.

Community
  • 1
  • 1
Jzl5325
  • 3,898
  • 8
  • 42
  • 62
  • 1
    FYI, do not call `np.polyval` on the result from `np.polynomial.polynomial.polyfit`. Use `np.polynomial.polynomial.polyval`. See http://stackoverflow.com/a/18767992/1730674 . – askewchan Nov 25 '13 at 21:14
  • a more elegant but still slow approach is to use the polyfit with `np.apply_along_axis`... [one example here](http://stackoverflow.com/a/16315330/832621) – Saullo G. P. Castro Nov 25 '13 at 21:15
  • @askewchan Absolutely! Laziness on my end with the full call path. @Saullo Castro - as you suggest `np.apply_along_axis` is no faster than an [i,j] iterator. I am wondering if a truly vectorized (at the C level) approach exists. – Jzl5325 Nov 25 '13 at 21:28

2 Answers2

8

np.polynomial.polynomial.polyval takes multidimensional coefficient arrays:

>>> x = np.random.rand(100)
>>> y = np.random.rand(100, 25)
>>> fit = np.polynomial.polynomial.polyfit(x, y, 2)
>>> fit.shape # 25 columns of 3 polynomial coefficients
(3L, 25L)
>>> xx = np.random.rand(50)
>>> interpol = np.polynomial.polynomial.polyval(xx, fit)
>>> interpol.shape # 25 rows, each with 50 evaluations of the polynomial
(25L, 50L)

And of course:

>>> np.all([np.allclose(np.polynomial.polynomial.polyval(xx, fit[:, j]),
...                     interpol[j]) for j in range(25)])
True
Jaime
  • 65,696
  • 17
  • 124
  • 159
0

np.polynomial.polynomial.polyval is a perfectly fine (and convenient) approach to efficient evaluation of polynomial fittings.

However, if 'speediest' is what you are looking for, simply constructing the polynomial inputs and using the rudimentary numpy matrix multiplication functions results in slightly faster ( roughly 4x faster) computational speeds.

Setup

Using the same setup as above, we'll create 25 different line fittings.

>>> num_samples = 100000
>>> num_lines = 100
>>> x = np.random.randint(0,100,num_samples)
>>> y = np.random.randint(0,100,(num_samples, num_lines))
>>> fit = np.polyfit(x,y,deg=2)
>>> xx = np.random.randint(0,100,num_samples*10)

Numpy's polyval Function

res1 = np.polynomial.polynomial.polyval(xx, fit)

Basic Matrix Multiplication

inputs = np.array([np.power(xx,d) for d in range(len(fit))])
res2 = fit.T.dot(inputs)

Timing the functions

Using the same parameters above...

%timeit _ = np.polynomial.polynomial.polyval(xx, fit)
1 loop, best of 3: 247 ms per loop

%timeit inputs = np.array([np.power(xx, d) for d in range(len(fit))]);_ = fit.T.dot(inputs)
10 loops, best of 3: 72.8 ms per loop

To beat a dead horse...

enter image description here

mean Efficiency bump of ~3.61x faster. Speed fluctuations probably come from random computer processes in background.

Ryan M
  • 711
  • 6
  • 9