Cannot understand with sklearn's PolynomialFeatures

Question

Need help in sklearn's Polynomial Features. It works quite well with one feature but whenever I add multiple features, it also outputs some values in the array besides the values raised to the power of the degrees. For ex: For this array,

X=np.array([[230.1,37.8,69.2]])

when I try to

X_poly=poly.fit_transform(X)

It outputs

[[ 1.00000000e+00 2.30100000e+02 3.78000000e+01 6.92000000e+01
5.29460100e+04 8.69778000e+03 1.59229200e+04 1.42884000e+03
2.61576000e+03 4.78864000e+03]]

Here, what is 8.69778000e+03,1.59229200e+04,2.61576000e+03 ?

dim · Accepted Answer · 2018-08-21T05:56:55.137

35

If you have features [a, b, c] the default polynomial features(in sklearn the degree is 2) should be [1, a, b, c, a^2, b^2, c^2, ab, bc, ca].

2.61576000e+03 is 37.8x62.2=2615,76 (2615,76 = 2.61576000 x 10^3)

In a simple way with the PolynomialFeatures you can create new features. There is a good reference here. Of course there are and disadvantages("Overfitting") of using PolynomialFeatures(see here).

Edit:
We have to be careful when using the polynomial features. The formula for calculating the number of the polynomial features is N(n,d)=C(n+d,d) where n is the number of the features, d is the degree of the polynomial, C is binomial coefficient(combination). In our case the number is C(3+2,2)=5!/(5-2)!2!=10 but when the number of features or the degree is height the polynomial features becomes too many. For example:

N(100,2)=5151
N(100,5)=96560646

So in this case you may need to apply regularization to penalize some of the weights. It is quite possible that the algorithm will start to suffer from curse of dimensionality (here is also a very nice discussion).

edited Aug 21 '18 at 05:56

answered Aug 18 '18 at 07:38

dim

992
11
26

2

Why does it gives ab,bc,ca? – TechieBoy101 Aug 18 '18 at 07:46
@TechieBoy101: It's polynomial features, not monomial features. There's nothing restricting it to only one variable at a time. – user2357112 Aug 18 '18 at 07:49
1

@TechieBoy101, The default `PolynomialFeatures` in `sklearn` includes all polynomial combinations. You can add `interaction_only=True` to exclude the powers like `a^2, b^2, c^2`. Of course you can exclude the interaction if your model performs better - the `PolynomialFeatures` are a simple way to derive new features (in some artificial manner). – dim Aug 18 '18 at 08:04
3

The polynomial features formula is incorrect, although the location for `bc` is correct. See `poly.get_feature_names(['a','b','c'])`, which will give `['1', 'a', 'b', 'c', 'a^2', 'a b', 'a c', 'b^2', 'b c', 'c^2']`. – Niko Föhr Nov 13 '20 at 14:22
@dim when we add the additional features by raising data into a power, don't we introduce multicolinearity? – Medan Jul 28 '21 at 20:35
In PolynomialFeatures what does fit() and transform() do exactly? Because even though I read the documentation I don't understand it, I try to give you an analogy with what fit() and transform() do in StandardScaler, but it doesn't seem to make sense; since the fit() and transform() methods do different things in both cases. – JEAN LEONARDO Feb 08 '22 at 05:44

Prasad Ostwal · Answer 2 · 2019-02-04T14:12:48.920

PolynomialFeatures generates a new matrix with all polynomial combinations of features with given degree.

Like [a] will be converted into [1,a,a^2] for degree 2.

You can visualize input being transformed into matrix generated by PolynomialFeatures.

from sklearn.preprocessing import PolynomialFeatures
a = np.array([1,2,3,4,5])
a = a[:,np.newaxis]
poly = PolynomialFeatures(degree=2)
a_poly = poly.fit_transform(a)
print(a_poly)

Output:

 [[ 1.  1.  1.]
 [ 1.  2.  4.]
 [ 1.  3.  9.]
 [ 1.  4. 16.]
 [ 1.  5. 25.]]

You can see matrix generated in form of [1,a,a^2]

To observe polynomial features on scatter plot, let's use number 1-100.

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures

#Making 1-100 numbers
a = np.arange(1,100,1)
a = a[:,np.newaxis]

#Scaling data with 0 mean and 1 standard Deviation, so it can be observed easily
scaler = StandardScaler()
a = scaler.fit_transform(a)

#Applying PolynomialFeatures
poly = PolynomialFeatures(degree=2)
a_poly = poly.fit_transform(a)

#Flattening Polynomial feature matrix (Creating 1D array), so it can be plotted. 
a_poly = a_poly.flatten()
#Creating array of size a_poly with number series. (For plotting)
xarr = np.arange(1,a_poly.size+1,1)

#Plotting
plt.scatter(xarr,a_poly)
plt.title("Degree 2 Polynomial")
plt.show()

Output:

2 Degree

Changing degree=3 ,we get:

3 Degree

Niko Föhr · Answer 3 · 2023-02-16T06:50:26.407

6

The general way to check the features is with poly.get_feature_names(). In this case, it would be

>>> poly.get_feature_names(['a','b','c'])
    ['1', 'a', 'b', 'c', 'a^2', 'a b', 'a c', 'b^2', 'b c', 'c^2']

and 8.69778000e+03,1.59229200e+04,2.61576000e+03 would correspond to the a*b, a*c and b*cterms, correspondingly.

edited Feb 16 '23 at 06:50

answered Nov 13 '20 at 14:33

Niko Föhr

28,336
10
93
96

score 2 · Answer 4 · answered Aug 18 '18 at 07:33

You have 3-dimensional data and the following code generates all poly features of degree 2:

X=np.array([[230.1,37.8,69.2]])
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures()
X_poly=poly.fit_transform(X)
X_poly
#array([[  1.00000000e+00,   2.30100000e+02,   3.78000000e+01,
#      6.92000000e+01,   5.29460100e+04,   8.69778000e+03,
#      1.59229200e+04,   1.42884000e+03,   2.61576000e+03,
#      4.78864000e+03]])

This can also be generated with the following code:

a, b, c = 230.1, 37.8, 69.2 # 3-dimensional data
np.array([[1,a,b,c,a**2,a*b,c*a,b**2,b*c,c**2]]) # all possible degree-2 polynomial features
# array([[  1.00000000e+00,   2.30100000e+02,   3.78000000e+01,
      6.92000000e+01,   5.29460100e+04,   8.69778000e+03,
      1.59229200e+04,   1.42884000e+03,   2.61576000e+03,
      4.78864000e+03]])

What about when we have an array of shape `(11, 1)` , how then would be all possible features? — Anoushiravan R, Sep 26 '22 at 19:07

score 1 · Answer 5 · answered Jun 24 '20 at 16:37

According scikit's 0.23 docs (and as far back as 0.15), PolynomialFeatures will

[generate] a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].

Cannot understand with sklearn's PolynomialFeatures

5 Answers5

Linked