After applying KernelPCA
to my data and passing it to a classifier (SVC
) I'm getting the following error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
and this warning while performing KernelPCA
:
RuntimeWarning: invalid value encountered in sqrt X_transformed = self.alphas_ * np.sqrt(self.lambdas_)
Looking at the transformed data I've found several nan
values.
It makes no difference which kernel
I'm using. I tried cosine
, rbf
and linear
.
But what's interesting:
My original data only contains values between 0 and 1 (no
inf
ornan
), it's scaled withMinMaxScaler
Applying standard
PCA
works, which I thought to be the same asKernelPCA
withlinear
kernel
.
Some more facts:
- My data is high dimensional ( > 8000 features) and mostly sparse.
- I'm using the newest version of scikit-learn, 18.2
Any idea how to overcome this and what could be the reason?