7

I know Memory Error related questions have been asked before, for example here, here, here, here, or here. And the suggested solutions always are to switch to Python 3 and/or to Window 64bit, or in case of faulty code, to fix the code. However, I am already on Python 3 and Win 64. I also can see from windows task manager that I have several GB of my 64GB of RAM still available when Python throw the Memory Error.

I have about 15 date-indexed, pandas data frames each with 14000 rows and on average 5000 columns of float data, and about 40-50% NaN values, that I read in from the hard drive. I can not simply drop NaNs because different columns have NaNs at different dates. Memory Error happens when I try to concatenate them with pd.concat(). So it's not a matter of some faulty code or while loop. If I leave some of the data frames out of concatenation, Memory Error does not happen in concatenation, but then when I try to do a Scikit learn decision tree analysis on the concatenated data, it happens.

My question is how can I get Python to use all the available memory and not throw Memory Error?

Edit: screenshots added IPython interpreter screenshot (I don't have Python 2 even installed): enter image description here

System information screenshot: enter image description here

Saeed
  • 1,848
  • 1
  • 18
  • 26
  • When you're talking about RAM in your PC, that's physical memory. But Python runs out of virtual memory. – Thomas Weller Mar 07 '18 at 22:11
  • @ThomasWeller, I added the screenshots. Frankly, I don't care what type of memory it runs out of. I just need to prevent it :-) – Saeed Mar 08 '18 at 00:00
  • 4
    If you don't care, you'll measure the wrong values. In the screenshot it says: available virtual memory 25 GB. If you load 15 tables*14000 row*5000 columns of 4 byte floats into memory, that's ~4.2 GB at least, not considering any overhead. Now, if you combine the first tabel with the second, it will need 2*280=560 MB. Then combine it with the next table: 840 MB, then 1.1 Gb etc ... until the last table is 4.2 GB. Sum that up and it will be ~8 GB plus the original 4.2 GB, so it's 12 GB. Panda can easily have a 100% overhead, so that's 24 GB (I know other libraries that have 700% overhead). – Thomas Weller Mar 08 '18 at 07:30
  • @ThomasWeller, thanks. That was enlightening. Will it fix the issue if I increase the virtual memory size from Windows performance management? – Saeed Mar 08 '18 at 17:16
  • Just suggesting some things you can try. One thing to try for helping read data is to preallocate your pandas DataFrame with all of the needed memory then insert the data in the correct position instead of concat. Is it possible that pd.concat is creating/duplicating/copying memory temporarily? You could also call garbage collection before your Scikit learn decision tree functions. – justengel Aug 01 '18 at 14:47

3 Answers3

3

Here are some links with snippets of information from them, I hope it helps.

How to give programs more RAM

Increase your pagefile. The pagefile is a section of the hard drive that is used as RAM by the processor and is also called Virtual Memory. Although it is not as fast as RAM because it is physically located on a hard drive, increasing it can sometimes increase program performance. To access it, open your Control Panel. Click "System," then "Advanced system settings," and then "Settings" in the "Performance" tab.

How do you set the memory usage for python programs

if u wanna limit the python vm memory usage,you can try this: 1、Linux, ulimit command to limit the memory usage on python 2、you can use resource module to limit the program memory usage; if u wanna speed up ur program though giving more memory to ur application, you could try this: 1\threading, multiprocessing 2\pypy 3\pysco on only python 2.5

Assign memory application

!.Right-click MY COMPUTER 2.Properties>Advanced>Performance>Settings>Advanced 3.Under, Virtual Memory check Paging File size,(if memory serves, the rule of thumb is 1/2 of total V. Memory may be used. If you use more, the paging file fights between the app. and the OS. Hope this help's

Allocate memory process server

Because some programs need more memory than others to work, you can use the Task Manager to allocate extra memory to specific processes in order to increase performance.

Others

http://stackoverflow.com/questions/1760025/limit-python-vm-memory

http://stackoverflow.com/questions/2308091/how-to-limit-python-heap-size

xavigisbeg
  • 115
  • 6
Elodin
  • 650
  • 5
  • 23
1

My PC's configuration is 8GB Memory and Windows 10 x64 OS. Also, Python 3 is installed there. I was getting this kind of exception while CSV files were being read by the python script. Fortunately, once pagefile's initial and max capacity was increased, the issue has been resolved.

For instructions of how to increase pagefiles values you can take a look this SO answer :here

0

What I would suggest is to increase your pagefile. I had the same problem and increasing my pagefile worked.

You can do that if you open your Control Panel. Click "System," then "Advanced system settings," and then "Settings" in the "Performance" tab. By default, Windows give only 0.5GB for pagefile. I increased my pafefile to 16GB and my code worked like a charm.

Hope this helps.

circuito
  • 33
  • 4