69

I need to produce large and big (very) matrices (Markov chains) for scientific purposes. I perform calculus that I put in a list of 20301 elements (=one row of my matrix). I need all those data in memory to proceed next Markov step but i can store them elsewhere (eg file) if needed even if it will slow my Markov chain walk-through. My computer (scientific lab): Bi-xenon 6 cores/12threads each, 12GB memory, OS: win64

  Traceback (most recent call last):
  File "my_file.py", line 247, in <module>
    ListTemp.append(calculus)
MemoryError

Example of calculus results: 9.233747520008198e-102 (yes, it's over 1/9000)

The error is raised when storing the 19766th element:

ListTemp[19766]
1.4509421012263216e-103

If I go further

Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    ListTemp[19767]
IndexError: list index out of range

So this list had a memory error at the 19767 loop.

Questions:

  1. Is there a memory limit to a list? Is it a "by-list limit" or a "global-per-script limit"?

  2. How to bypass those limits? Any possibilites in mind?

  3. Will it help to use numpy, python64? What are the memory limits with them? What about other languages?

martineau
  • 119,623
  • 25
  • 170
  • 301
Taupi
  • 711
  • 1
  • 5
  • 6

4 Answers4

59

First off, see How Big can a Python Array Get? and Numpy, problem with long arrays

Second, the only real limit comes from the amount of memory you have and how your system stores memory references. There is no per-list limit, so Python will go until it runs out of memory. Two possibilities:

  1. If you are running on an older OS or one that forces processes to use a limited amount of memory, you may need to increase the amount of memory the Python process has access to.
  2. Break the list apart using chunking. For example, do the first 1000 elements of the list, pickle and save them to disk, and then do the next 1000. To work with them, unpickle one chunk at a time so that you don't run out of memory. This is essentially the same technique that databases use to work with more data than will fit in RAM.
Community
  • 1
  • 1
Gordon Seidoh Worley
  • 7,839
  • 6
  • 45
  • 82
  • [source](http://stackoverflow.com/questions/855191/how-big-can-a-python-array-get/855455#855455) _Therefore the maximum size of a python list on a 32 bit system is 536,870,912 elements._ Okey, but in my case, I'm far this value, my memory is only of ~1.3 Gb on this process (near the limit of a python32 process?). What will be the limit in python64? How chunking will be slow my markov chain walkthrough? Thanks @ all for your answers and advices. Nice place to learn! – Taupi Apr 04 '11 at 11:35
  • 2
    A 32-bit process has a theoretical limit of 4 GB of memory, though if your OS is also 32-bit it will obviously be less since the OS will take up some of that memory. Chunking will slow you down, but in some cases you have to accept a slow-down just to finish processing. What are you storing in that list? Maybe that would help explain what's happening. – Gordon Seidoh Worley Apr 04 '11 at 19:58
30

The MemoryError exception that you are seeing is the direct result of running out of available RAM. This could be caused by either the 2GB per program limit imposed by Windows (32bit programs), or lack of available RAM on your computer. (This link is to a previous question).

You should be able to extend the 2GB by using 64bit copy of Python, provided you are using a 64bit copy of windows.

The IndexError would be caused because Python hit the MemoryError exception before calculating the entire array. Again this is a memory issue.

To get around this problem you could try to use a 64bit copy of Python or better still find a way to write you results to file. To this end look at numpy's memory mapped arrays.

You should be able to run you entire set of calculation into one of these arrays as the actual data will be written disk, and only a small portion of it held in memory.

Community
  • 1
  • 1
thomas
  • 949
  • 6
  • 20
  • 1
    Hello Thomas, I think you are right, I must overflow the 2GB per programm imposed by my python32 on my windows-7-64b. What will be the limit for a python64 programm? thanks – Taupi Apr 04 '11 at 11:46
  • Significantly more than the physical memory in your machine, since you're squaring the size of the address space by moving to 64 bits :) – ncoghlan Apr 04 '11 at 14:12
  • @ncoghlan is right. If you follow the link in the second row the limit is mentioned. Which is a whopping 8TB. If you hit that limit then your definitely doing something wrong :) – thomas Apr 04 '11 at 14:42
8

There is no memory limit imposed by Python. However, you will get a MemoryError if you run out of RAM. You say you have 20301 elements in the list. This seems too small to cause a memory error for simple data types (e.g. int), but if each element itself is an object that takes up a lot of memory, you may well be running out of memory.

The IndexError however is probably caused because your ListTemp has got only 19767 elements (indexed 0 to 19766), and you are trying to access past the last element.

It is hard to say what you can do to avoid hitting the limit without knowing exactly what it is that you are trying to do. Using numpy might help. It looks like you are storing a huge amount of data. It may be that you don't need to store all of it at every stage. But it is impossible to say without knowing.

MAK
  • 26,140
  • 11
  • 55
  • 86
  • 2
    Python, like any other program, uses the entire virtual memory, not just the physical (RAM). The poster can increase the swap memory available (which is a file in Windows and a partition in Linux). – Ricardo Magalhães Cruz Jan 05 '16 at 16:00
  • 3
    You can also use a swap file on linux. Taken from the Arch Wiki: `# touch /swapfile` `# fallocate -l 512M /swapfile` `# dd if=/dev/zero of=/swapfile bs=1M count=512` `# chmod 600 /swapfile` `# mkswap /swapfile` `# swapon /swapfile` `# echo "/swapfile none swap defaults 0 0" >> /etc/fstab` – copeland3300 Sep 14 '16 at 19:04
0

If you want to circumvent this problem you could also use the shelve. Then you would create files that would be the size of your machines capacity to handle, and only put them on the RAM when necessary, basically writing to the HD and pulling the information back in pieces so you can process it.

Create binary file and check if information is already in it if yes make a local variable to hold it else write some data you deem necessary.

Data = shelve.open('File01')
   for i in range(0,100):
     Matrix_Shelve = 'Matrix' + str(i)
     if Matrix_Shelve in Data:
        Matrix_local = Data[Matrix_Shelve]
     else:
        Data[Matrix_Selve] = 'somenthingforlater'

Hope it doesn't sound too arcaic.