1

I am using Windows 10. My system configuration is RAM = 256 GB and Hard Drive = 2 TB. I am using PyCharm community edition for coding. And I am using python version shown below:

Python 3.9.7 [MSC v.1929 64 bit (AMD64)] on win32

My data is of size 1.2 million rows with some 7000 columns. When I tried to fit the processed data into a Scikit-learn Random Forrest model, I get the following error:

joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Program Files\Python39\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
    r = call_item()
  File "C:\Program Files\Python39\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Program Files\Python39\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\joblib\parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\joblib\parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\sklearn\utils\fixes.py", line 209, in __call__
    return self.function(*args, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _fit_and_score
    X_test, y_test = _safe_split(estimator, X, y, test, train)
  File "C:\Program Files\Python39\lib\site-packages\sklearn\utils\metaestimators.py", line 286, in _safe_split
    X_subset = _safe_indexing(X, indices)
  File "C:\Program Files\Python39\lib\site-packages\sklearn\utils\__init__.py", line 377, in _safe_indexing
    return _array_indexing(X, indices, indices_dtype, axis=axis)
  File "C:\Program Files\Python39\lib\site-packages\sklearn\utils\__init__.py", line 201, in _array_indexing
    return array[key] if axis == 0 else array[:, key]
  File "C:\Program Files\Python39\lib\site-packages\numpy\core\memmap.py", line 331, in __getitem__
    res = super(memmap, self).__getitem__(index)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 721. MiB for an array with shape (120000, 7000) and data type uint8
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\...\main.py", line 45, in <module>
    model.fit(X, y)
  File "C:\Program Files\Python39\lib\site-packages\sklearn\model_selection\_search.py", line 891, in fit
    self._run_search(evaluate_candidates)
  File "C:\Program Files\Python39\lib\site-packages\sklearn\model_selection\_search.py", line 1766, in _run_search
    evaluate_candidates(
  File "C:\Program Files\Python39\lib\site-packages\sklearn\model_selection\_search.py", line 838, in evaluate_candidates
    out = parallel(
  File "C:\Program Files\Python39\lib\site-packages\joblib\parallel.py", line 1054, in __call__
    self.retrieve()
  File "C:\Program Files\Python39\lib\site-packages\joblib\parallel.py", line 933, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Program Files\Python39\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 445, in result
    return self.__get_result()
  File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 390, in __get_result
    raise self._exception
numpy.core._exceptions.MemoryError: Unable to allocate 721. MiB for an array with shape (120000, 7000) and data type uint8

I tried some of the solutions here:

  1. Increasing the console memory of PyCharm. Increase output buffer when running or debugging in PyCharm
  2. Increasing the size of paging file size Unable to allocate array with shape and data type

But these are not working. How can I solve this issue when I have plenty of memory in RAM and hard drive? I tried it with PyCharm, Jupyter, and google colab. I also tried simply running the script with commandline. But it displays the same error.

Vance Pyton
  • 174
  • 8
  • Maybe https://stackoverflow.com/a/20952691/16744221 could help you. I haven't tested it, but it seems like you will not be able to use a generator to train your random forest at once. – Elger Jun 09 '22 at 23:12
  • That array doesn't look that big. I suspect there are many more arrays that chew up memory. – hpaulj Jun 10 '22 at 00:57
  • @hpaulj I am using all the 64 cores in `n_jobs=-1` parameter. May be that is why? – Vance Pyton Jun 10 '22 at 01:35
  • I haven't worked with large multicore computers, so can't help. – hpaulj Jun 10 '22 at 01:38
  • Maybe upgrading to Linux is an option? Or running outside an IDE, directly with Python. – Mark Setchell Jun 14 '22 at 05:42
  • @MarkSetchell unfortunately updating to linux is not an option. I have to work with windows. I tried with `python` command line. But resulted in the same error. – Vance Pyton Jun 14 '22 at 16:09

0 Answers0