I am using Windows 10. My system configuration is RAM = 256 GB and Hard Drive = 2 TB. I am using PyCharm community edition for coding. And I am using python
version shown below:
Python 3.9.7 [MSC v.1929 64 bit (AMD64)] on win32
My data is of size 1.2 million rows with some 7000 columns. When I tried to fit the processed data into a Scikit-learn
Random Forrest model, I get the following error:
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Program Files\Python39\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
r = call_item()
File "C:\Program Files\Python39\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
return self.fn(*self.args, **self.kwargs)
File "C:\Program Files\Python39\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File "C:\Program Files\Python39\lib\site-packages\joblib\parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "C:\Program Files\Python39\lib\site-packages\joblib\parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "C:\Program Files\Python39\lib\site-packages\sklearn\utils\fixes.py", line 209, in __call__
return self.function(*args, **kwargs)
File "C:\Program Files\Python39\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _fit_and_score
X_test, y_test = _safe_split(estimator, X, y, test, train)
File "C:\Program Files\Python39\lib\site-packages\sklearn\utils\metaestimators.py", line 286, in _safe_split
X_subset = _safe_indexing(X, indices)
File "C:\Program Files\Python39\lib\site-packages\sklearn\utils\__init__.py", line 377, in _safe_indexing
return _array_indexing(X, indices, indices_dtype, axis=axis)
File "C:\Program Files\Python39\lib\site-packages\sklearn\utils\__init__.py", line 201, in _array_indexing
return array[key] if axis == 0 else array[:, key]
File "C:\Program Files\Python39\lib\site-packages\numpy\core\memmap.py", line 331, in __getitem__
res = super(memmap, self).__getitem__(index)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 721. MiB for an array with shape (120000, 7000) and data type uint8
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\...\main.py", line 45, in <module>
model.fit(X, y)
File "C:\Program Files\Python39\lib\site-packages\sklearn\model_selection\_search.py", line 891, in fit
self._run_search(evaluate_candidates)
File "C:\Program Files\Python39\lib\site-packages\sklearn\model_selection\_search.py", line 1766, in _run_search
evaluate_candidates(
File "C:\Program Files\Python39\lib\site-packages\sklearn\model_selection\_search.py", line 838, in evaluate_candidates
out = parallel(
File "C:\Program Files\Python39\lib\site-packages\joblib\parallel.py", line 1054, in __call__
self.retrieve()
File "C:\Program Files\Python39\lib\site-packages\joblib\parallel.py", line 933, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "C:\Program Files\Python39\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 445, in result
return self.__get_result()
File "C:\Program Files\Python39\lib\concurrent\futures\_base.py", line 390, in __get_result
raise self._exception
numpy.core._exceptions.MemoryError: Unable to allocate 721. MiB for an array with shape (120000, 7000) and data type uint8
I tried some of the solutions here:
- Increasing the console memory of PyCharm. Increase output buffer when running or debugging in PyCharm
- Increasing the size of paging file size Unable to allocate array with shape and data type
But these are not working. How can I solve this issue when I have plenty of memory in RAM and hard drive? I tried it with PyCharm, Jupyter, and google colab. I also tried simply running the script with commandline. But it displays the same error.