how easy is to cause an outOfMemoryExeption in python?

Question

Im dong a spelling bee program in python using pygame, and it works fine, but i have been testing it with 7 words, not more.
Im worried that, if used with 300 words it might cause the memory to fill. remember there are 2 arrays: One holds the default list of words, and the other holds the randomized words.

It would likely depend on the computer it's being run on. 300 strings alone isn't going to cause memory problems though unless you're running it on a potato. — Carcigenicate, Nov 24 '18 at 17:17
It's *easy* to use all available memory:`a = 'a'*100000000000000000`. But 300 words is not going to cause a problem on any modern system. — juanpa.arrivillaga, Nov 24 '18 at 17:20
1): Remember they are 600 due to being the normal list and the randomized list. — Gabriel Mation, Nov 24 '18 at 17:20
See [Memory error allocating list of 11,464,882 empty dicts](//stackoverflow.com/a/19456746) for an example of an actual memory issue. Note the numbers involved. — Martijn Pieters, Nov 24 '18 at 17:22
@GabrielMation dude, a million words isn't going to cause a problem. 600 words is nothing. — juanpa.arrivillaga, Nov 24 '18 at 17:22
Well, you can always just *try* it an monitor your memory usage. — juanpa.arrivillaga, Nov 24 '18 at 17:23
@GabrielMation: lets say your words have a median 5 letters, and only use ASCII characters. Then you need 54 bytes per word (49 bytes for an empty string object, plus 5 for the ASCII letters), times 600, makes about 30Kb. That's *really very, very little memory*. — Martijn Pieters, Nov 24 '18 at 17:23
@GabrielMation In Java, I can hold 300,000+ (three hundred thousand!) English words in memory at once on my low end computer. Hundreds of words is nothing. — Carcigenicate, Nov 24 '18 at 17:29
And you were potentially downvoted because this is fairly easy to test on your own. You could generate random words and count how many you can fit in a list before it crashes. Trying stuff yourself is the best way to learn. — Carcigenicate, Nov 24 '18 at 17:30

Martijn Pieters · Answer 1 · 2018-11-24T18:11:36.570

You really do not need to worry. Python is not such a memory hog as to cause issues with a mere 600 words.

With a bit of care, you can measure memory requirements directly. The sys.getsizeof() function lets you measure the direct memory requirements of a given Python object (only direct memory, not anything that it references!). You could use this to measure individual strings:

>>> import sys
>>> sys.getsizeof("Hello!")
55
>>> sys.getsizeof("memoryfootprint")
64

Exact sizes depend on the Python version and your OS. A Python string object needs a base amount of memory for a lot of book-keeping information, and then 1, 2 or 4 bytes per character, depending on the highest Unicode code point. For ASCII, that's just one byte per letter. Python 3.7, on my Mac OS X system uses 49 bytes for the bookkeeping portion.

Getting the size of a Python list object means you get just the list object memory requirements, not anything that's stored 'in' the list. You can repeatedly add the same object to a list and you'd not get copies, because Python uses references for everything, including list contents. Take that into account.

So lets load 300 random words, and create two lists, to see what the memory needs will be:

>>> import random
>>> words = list(map(str.strip, open('/usr/share/dict/words')))  # big file of words, present on many computers
>>> words = random.sample(words, 300)  # just use 300
>>> words[:10]
['fourer', 'tampon', 'Minyadidae', 'digallic', 'euploid', 'Mograbi', 'sketchbook', 'annul', 'ambilogy', 'outtalent']
>>> import statistics
>>> statistics.mean(map(len, words))
9.346666666666666
>>> statistics.median(map(len, words))
9.0
>>> statistics.mode(map(len, words))
10
>>> sys.getsizeof(words)
2464
>>> sum(sys.getsizeof(word) for word in words)
17504

That's one list, with 300 unique words with an average length of just over 9 characters, and that required 2464 bytes for the list, and 17504 bytes for the words themselves. That's less that not even 20KB.

But, you say, you have 2 lists. But that second list will not have copies of your words, that's just more references to the existing words, so that'll only take another 2464 bytes, so 2KB.

For 300 random English words, in two lists, your total memory requirements are around 20KB of memory.

On an 8GB machine, you will not have any problems. Note that I loaded the whole words file in one go into my computer, and then cut that back to 300 random words. Here is how much memory that whole initial list requires:

>>> words = list(map(str.strip, open('/usr/share/dict/words')))
>>> len(words)
235886
>>> sum(sys.getsizeof(word) for word in words)
13815637
>>> sys.getsizeof(words)
2007112

That's about 15MB of memory, for close to 236 thousand words.

If you are worried about larger programs with more objects, that you can also use the tracemalloc library to get statistics about memory use:

last = None
def display_memory_change(msg):
    global last
    snap = tracemalloc.take_snapshot()
    statdiff, last = snap.compare_to(last, 'filename', True), snap
    tot = sum(s.size for s in statdiff)
    change = sum(s.size_diff for s in statdiff)
    print('{:>20} (Tot: {:6.1f} MiB, Inc: {:6.1f} MiB)'.format(
        msg, tot / 2 ** 20, change / 2 ** 20))


# at the start, get a baseline
tracemalloc.start()
last = tracemalloc.take_snapshot()

# create objects, run more code, etc.

display_memory_change("Some message as to what has been done")

# run some more code.

display_memory_change("Show some more statistics")

Using the above code to measure reading all those words:

tracemalloc.start()
last = tracemalloc.take_snapshot()
display_memory_change("Baseline")

words = list(map(str.strip, open('/usr/share/dict/words')))

display_memory_change("Loaded words list")

gives an output of

            Baseline (Tot:    0.0 MiB, Inc:    0.0 MiB)
   Loaded words list (Tot:   15.1 MiB, Inc:   15.1 MiB)

confirming my sys.getsizeof() measurements.

Sorry, you can only mark one as the accepted answer. The choice is entirely yours, and not picking one is also an option. — Martijn Pieters, Nov 24 '18 at 18:07

score 0 · Accepted Answer · answered Nov 24 '18 at 17:24

0

One good way to find out is to try it.

You can put a line midway through your program to print out how much memory it is using:

import os
import psutil
process = psutil.Process(os.getpid())
print(process.memory_info().rss)

Try running your program with different numbers of words and plotting the results:

Then you can predict how many words it would take to use up all your memory.

A few other points to keep in mind:

If you are using 32 bit Python, your total memory will be limited by the 32 bit address space to about 4 GB.
Your computer likely uses the disk to increase the virtual memory beyond the RAM size. So, even if you only have 1 GB RAM, you might find you can use 3 GB of memory in your program.
For small lists of words like you are using, you will almost never run out of memory unless your program has a bug. In my experience, OutOfMemory is almost always because I made a mistake.

answered Nov 24 '18 at 17:24

Owen

38,836
14
95
125

1

OS memory allocation is not a good way to measure this, as that happens in chunks and Python uses a heap model (meaning it'll request larger blocks). – Martijn Pieters Nov 24 '18 at 17:25
2

Instead, use [`tracemalloc` snapshots](https://docs.python.org/3/library/tracemalloc.html). – Martijn Pieters Nov 24 '18 at 17:26
so how can i measure it? – Gabriel Mation Nov 24 '18 at 17:26
@MartijnPieters That's a good point. Of course, if you are nearing using your whole RAM, os memory usage will be a decent approximation. – Owen Nov 24 '18 at 17:26
See [Python: Cannot replicate a test on memory usage](//stackoverflow.com/a/51031010) for a concrete discussion of a blogpost that attempted to measure Python memory use (incorrectly), the [linked gist](https://gist.github.com/mjpieters/1d9ce2c84b858ef7cb7192311e49bb49#file-test-pickle-tracemalloc-py) includes how you could use `tracemalloc` to measure memory use. – Martijn Pieters Nov 24 '18 at 17:32
i runned the code above to see what could i get, and i have a question: The number displayed represents bytes? – Gabriel Mation Nov 24 '18 at 17:33

how easy is to cause an outOfMemoryExeption in python?

2 Answers2