1

I am trying to implement a K-Neighbours Classification model on a dataset with shape (60000,32,32) on my system (16 GB ram, I5 8th gen processor, 256 GB hard disk). Though I have normalized the data, still predictions are taking enormous amount of time due to the size of the data. Is there any way to utilize multiple cores of my system or increase the allocated RAM on jupyter notebook to save on computational time and speed up calculations ?

Sarvagya Dubey
  • 435
  • 1
  • 7
  • 22
  • 2
    Not all parts of `scikit-learn` support parallel processing, but if you use something which has an `n_jobs` parameter, that is what you can try setting to 6. – tevemadar Jan 13 '20 at 18:06
  • 1
    There is a way of increasing jupyter notebook memory limit, check out this question. [Jupyter notebook memory limit](https://stackoverflow.com/questions/57948003/jupyter-notebook-memory-limit) – Manuel Jan 13 '20 at 18:12
  • 1
    to speed up computation you could also set the `algorithm` parameter of KNN to `‘ball_tree’` or `‘kd_tree’`, that way you get faster approximate nearest neighbors. Also you could try doing some feature selection or feature engineering, as 60000 1024-dimensional samples seems pretty big and the data will probably have a lower dimensional representation of quite good quality. The samples are 32x32 images? What is the size of the raw data? – dhasson Jan 14 '20 at 13:43
  • I did set the algorithm to kd_tree owing to huge dimensionality. Still it got stuck. Yes indeed its the MNIST 32x32 dataset – Sarvagya Dubey Jan 14 '20 at 15:19

0 Answers0