5

I am trying to write a function that maps 2d-ndarray to 2d-ndarray. The rows of the input array can be processed independently and there shall be a 1-to-1 correspondence between rows of the input and rows of the output. For each row of the input, the polynomial expansion of a given order for the row shall be computed (see docstring for an example). The current implementation works; however it requires an explicit loop over the rows and duplication of rows in the "powerMatrix"). Is it possible to get the same result with a single call to numpy.power? Btw.: the order of the entries in the result's rows doesn't matter to me.

import numpy
def polynomialFeatures(x, order):
    """ Generate polynomial features of given order for data x.

    For each row of ndarray x, the polynomial expansions are computed, i.e
    for row [x1, x2] and order 2, the following row of the result matrix is
    computed: [1, x1, x1**2, x2, x1*x2, x1**2*x2, x2**2, x1*x2**2, x1**2*x2**2]

    Parameters
    ----------
    x : array-like
        2-D array; for each of its rows, the polynomial features are created

    order : int
        The order of the polynomial features

    Returns
    -------
    out : ndarray
        2-D array of shape (x.shape[0], (order+1)**x.shape[1]) containing the 
        polynomial features computed for the rows of the array x

    Examples
    --------
    >>> polynomialFeatures([[1, 2, 3], [-1, -2, -3]], 2)
    array([[  1   3   9   2   6  18   4  12  36   1   3   9   2   6  18   4  12  
             36   1   3   9   2   6  18   4  12  36]
           [  1  -3   9  -2   6 -18   4 -12  36  -1   3  -9   2  -6  18  -4  12 
            -36   1  -3   9  -2   6 -18   4 -12  36]])
    """
    x = numpy.asarray(x)
    # TODO: Avoid duplication of rows
    powerMatrix = numpy.array([range(order+1)] * x.shape[1]).T
    # TODO: Avoid explicit loop, and use numpy's broadcasting
    F = []
    for i in range(x.shape[0]):
        X = numpy.power(x[i], powerMatrix).T
        F.append(numpy.multiply.reduce(cartesian(X), axis=1))

    return numpy.array(F)

print numpy.all(polynomialFeatures([[1, 2, 3], [-1, -2, -3]], 2) ==
                numpy.array([[1,   3,   9,   2,   6,  18,   4,  12,  36,   1,
                              3,   9,   2,   6,  18,   4,  12,  36,   1,   3, 
                              9,   2,   6,  18,   4,  12,  36],
                             [1,  -3,   9,  -2,   6, -18,   4, -12,  36,  -1,
                              3,  -9,   2,  -6,  18,  -4,  12, -36,   1,  -3,
                              9,  -2,   6, -18,   4, -12,  36]]))

Thanks, Jan

EDIT: The missing function cartesian is defined here: Using numpy to build an array of all combinations of two arrays

Community
  • 1
  • 1
frisbee
  • 73
  • 1
  • 6

1 Answers1

3

The basic idea is to move the dimension (in your case, dimension 0, the number of rows) that's irrelevant to the calculation "out of the way" into a higher dimension and then automatically broadcast over it.

I'm not sure what your cartesian method is doing, but here's a solution that uses np.indices to generate indexing tuples over the X matrix:

import numpy as np
def polynomial_features(x, order):
    x = np.asarray(x).T[np.newaxis]
    n = x.shape[1]
    power_matrix = np.tile(np.arange(order + 1), (n, 1)).T[..., np.newaxis]
    X = np.power(x, power_matrix)
    I = np.indices((order + 1, ) * n).reshape((n, (order + 1) ** n)).T
    F = np.product(np.diagonal(X[I], 0, 1, 2), axis=2)
    return F.T
ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • +1. [Deleted a silly comment of mine -- I had passed `[1,2,3]` to your function instead of `[[1,2,3]]`, which of course gave equally silly results.] – DSM Jul 30 '12 at 20:02
  • This is a really elegant solution! I would like to use it in a little project I am working on if you don't mind: https://github.com/dreamwalkerrr/mledu. (will attribute this algorithm to you linking to this answer if that is okay?). Also, could you please let me know what would be the best way to exclude terms raised to the 0th power? – dreamwalker Nov 21 '13 at 15:40
  • guess I can just always delete the first column, this is quite a neat piece of code! thx a lot! :) – dreamwalker Nov 21 '13 at 16:16
  • @dreamwalker yeah, that'd probably be easiest. Feel free to use the code however you'd like and credit me if you feel like it :) – ecatmur Nov 21 '13 at 17:37