4

My Question is highly related to math domain error while using PCA

I get the following error:

  File "$path$\Python\Python36\lib\site-packages\sklearn\decomposition\pca.py", line 88, in _assess_dimension_(1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples)
ValueError: math domain error

which refers to this line of code :

pa += log((spectrum[i] - spectrum[j]) * (1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples)

After looking closer i found out that the problem is caused by this part of the equation:

(spectrum[i] - spectrum[j])

which results in 0 if these values are equal. This leads to a multiplication by 0 which results in a log(0) what causes this exception.

Now my question. Is the fact this error can occur a sign that my data is bad or should the implementation handle this case? If the implementation should handle this, what way would you recommend to handle this properly? In the linked question there is already an answer to this but it doesn't look very confident to be right and hasn't any feedback.

Created an issue on the github repo of scikit-learn containing steps to reproduce the error.

Yannic Bürgmann
  • 6,301
  • 5
  • 43
  • 77
  • 1
    You should post this on scikit-learn github issues page. Its mentioned on the PCA documentation that when `n_components == ‘mle’`, [Minka’s MLE](https://pdfs.semanticscholar.org/cbaa/eb023b8a07ee05a617791f7740a176a1de1b.pdf) is used to guess the dimension. So maybe that has implementation problem. – Vivek Kumar Nov 29 '17 at 08:57
  • I know that Minka's MLE is used with this configuration. And I want to find out if there is an implementation problem with this algorithm or if my data is just not suitable for this algorithm. I opened an issue on github aswell and refered to it. – Yannic Bürgmann Nov 29 '17 at 10:41

2 Answers2

1

This is due to an open issue inside sklearn. This is confirmed here

Yannic Bürgmann
  • 6,301
  • 5
  • 43
  • 77
1

A fix to this issue was introduced in scikit-learn 0.23.0, so simply update to this version.

Release Notes for scikit-learn 0.23

[MRG+1] Adress decomposition.PCA mle option problem #16224

rosyaniv
  • 51
  • 2