0

I am trying to train a MultinomialNB classifier on a huge set of data (features as well as targets, (about 75k x 130k)). I am aware of the fact, that this classifier will generate a distinct one for each target, thus the memory is expected to explode.

However, the process won't allocate more than about 20GB of RAM even though the machine has about 640GB.

I have tried to set memory lock, tried to run as root (which I have to to adjust these limits), but it won't work.

Traceback (most recent call last):
    File "test_classifiers.py", line 202, in <module>
        train_mb()
    File "test_classifiers.py", line 168, in train_mb
        mb_classifier.partial_fit(X, y, list(set(y)))
    File "/usr/local/lib/python3.5/dist-packages/sklearn/naive_bayes.py", line 539, in partial_fit
        Y = label_binarize(y, classes=self.classes_)
    File "/usr/local/lib/python3.5/dist-packages/sklearn/preprocessing/label.py", line 657, in label_binarize
        Y = Y.toarray()
    File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py", line 1024, in toarray
        out = self._process_toarray_args(order, out)
    File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/base.py", line 1186, in _process_toarray_args
        return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError
resource.setrlimit(resource.RLIMIT_MEMLOCK, (-1, -1))

and

resource.setrlimit(resource.RLIMIT_MEMLOCK, (resource.RLIM_INFINITY, resource.RLIM_INFINITY))

Have been tried, any Ideas? Does it correlate to the fact, that only one CPU can be used, using this classifier?

Ruslan
  • 18,162
  • 8
  • 67
  • 136
user3463725
  • 1
  • 1
  • 2

0 Answers0