1

Im dong a spelling bee program in python using pygame, and it works fine, but i have been testing it with 7 words, not more.
Im worried that, if used with 300 words it might cause the memory to fill. remember there are 2 arrays: One holds the default list of words, and the other holds the randomized words.

2 Answers2

4

You really do not need to worry. Python is not such a memory hog as to cause issues with a mere 600 words.

With a bit of care, you can measure memory requirements directly. The sys.getsizeof() function lets you measure the direct memory requirements of a given Python object (only direct memory, not anything that it references!). You could use this to measure individual strings:

>>> import sys
>>> sys.getsizeof("Hello!")
55
>>> sys.getsizeof("memoryfootprint")
64

Exact sizes depend on the Python version and your OS. A Python string object needs a base amount of memory for a lot of book-keeping information, and then 1, 2 or 4 bytes per character, depending on the highest Unicode code point. For ASCII, that's just one byte per letter. Python 3.7, on my Mac OS X system uses 49 bytes for the bookkeeping portion.

Getting the size of a Python list object means you get just the list object memory requirements, not anything that's stored 'in' the list. You can repeatedly add the same object to a list and you'd not get copies, because Python uses references for everything, including list contents. Take that into account.

So lets load 300 random words, and create two lists, to see what the memory needs will be:

>>> import random
>>> words = list(map(str.strip, open('/usr/share/dict/words')))  # big file of words, present on many computers
>>> words = random.sample(words, 300)  # just use 300
>>> words[:10]
['fourer', 'tampon', 'Minyadidae', 'digallic', 'euploid', 'Mograbi', 'sketchbook', 'annul', 'ambilogy', 'outtalent']
>>> import statistics
>>> statistics.mean(map(len, words))
9.346666666666666
>>> statistics.median(map(len, words))
9.0
>>> statistics.mode(map(len, words))
10
>>> sys.getsizeof(words)
2464
>>> sum(sys.getsizeof(word) for word in words)
17504

That's one list, with 300 unique words with an average length of just over 9 characters, and that required 2464 bytes for the list, and 17504 bytes for the words themselves. That's less that not even 20KB.

But, you say, you have 2 lists. But that second list will not have copies of your words, that's just more references to the existing words, so that'll only take another 2464 bytes, so 2KB.

For 300 random English words, in two lists, your total memory requirements are around 20KB of memory.

On an 8GB machine, you will not have any problems. Note that I loaded the whole words file in one go into my computer, and then cut that back to 300 random words. Here is how much memory that whole initial list requires:

>>> words = list(map(str.strip, open('/usr/share/dict/words')))
>>> len(words)
235886
>>> sum(sys.getsizeof(word) for word in words)
13815637
>>> sys.getsizeof(words)
2007112

That's about 15MB of memory, for close to 236 thousand words.

If you are worried about larger programs with more objects, that you can also use the tracemalloc library to get statistics about memory use:

last = None
def display_memory_change(msg):
    global last
    snap = tracemalloc.take_snapshot()
    statdiff, last = snap.compare_to(last, 'filename', True), snap
    tot = sum(s.size for s in statdiff)
    change = sum(s.size_diff for s in statdiff)
    print('{:>20} (Tot: {:6.1f} MiB, Inc: {:6.1f} MiB)'.format(
        msg, tot / 2 ** 20, change / 2 ** 20))


# at the start, get a baseline
tracemalloc.start()
last = tracemalloc.take_snapshot()

# create objects, run more code, etc.

display_memory_change("Some message as to what has been done")

# run some more code.

display_memory_change("Show some more statistics")

Using the above code to measure reading all those words:

tracemalloc.start()
last = tracemalloc.take_snapshot()
display_memory_change("Baseline")

words = list(map(str.strip, open('/usr/share/dict/words')))

display_memory_change("Loaded words list")

gives an output of

            Baseline (Tot:    0.0 MiB, Inc:    0.0 MiB)
   Loaded words list (Tot:   15.1 MiB, Inc:   15.1 MiB)

confirming my sys.getsizeof() measurements.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

One good way to find out is to try it.

You can put a line midway through your program to print out how much memory it is using:

import os
import psutil
process = psutil.Process(os.getpid())
print(process.memory_info().rss)

Try running your program with different numbers of words and plotting the results:

graph plotting total memory vs. number of words

Then you can predict how many words it would take to use up all your memory.

A few other points to keep in mind:

  • If you are using 32 bit Python, your total memory will be limited by the 32 bit address space to about 4 GB.
  • Your computer likely uses the disk to increase the virtual memory beyond the RAM size. So, even if you only have 1 GB RAM, you might find you can use 3 GB of memory in your program.
  • For small lists of words like you are using, you will almost never run out of memory unless your program has a bug. In my experience, OutOfMemory is almost always because I made a mistake.
Owen
  • 38,836
  • 14
  • 95
  • 125
  • 1
    OS memory allocation is not a good way to measure this, as that happens in chunks and Python uses a heap model (meaning it'll request larger blocks). – Martijn Pieters Nov 24 '18 at 17:25
  • 2
    Instead, use [`tracemalloc` snapshots](https://docs.python.org/3/library/tracemalloc.html). – Martijn Pieters Nov 24 '18 at 17:26
  • so how can i measure it? – Gabriel Mation Nov 24 '18 at 17:26
  • @MartijnPieters That's a good point. Of course, if you are nearing using your whole RAM, os memory usage will be a decent approximation. – Owen Nov 24 '18 at 17:26
  • See [Python: Cannot replicate a test on memory usage](//stackoverflow.com/a/51031010) for a concrete discussion of a blogpost that attempted to measure Python memory use (incorrectly), the [linked gist](https://gist.github.com/mjpieters/1d9ce2c84b858ef7cb7192311e49bb49#file-test-pickle-tracemalloc-py) includes how you could use `tracemalloc` to measure memory use. – Martijn Pieters Nov 24 '18 at 17:32
  • i runned the code above to see what could i get, and i have a question: The number displayed represents bytes? – Gabriel Mation Nov 24 '18 at 17:33