How to manage memory error in python?

Question

This is the code I have, to count the frequency

import collections
import codecs
import io
from collections import Counter
with io.open('Combine.txt', 'r', encoding='utf8') as infh:
    words =infh.read().split()
    with open('Counts2.txt', 'wb') as f:
        for word, count in Counter(words).most_common(100000000):
            f.write(u'{} {}\n'.format(word, count).encode('utf-8'))

When I try to read a big file( 4 GB) I am getting error

Traceback (most recent call last):
  File "counter.py", line 7, in <module>
    words =infh.read().split()
  File "/usr/lib/python2.7/codecs.py", line 296, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError

I am using Ubuntu 12.4, 8 GB RAM Intel Core i7 How to fix this error ? /

usr/lib/python2.7/codecs.py", line 296, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    MemoryError

This should help: http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python — Jayanth Koushik, Feb 11 '14 at 12:23
To read a file word by word, use a space as the delimiter with the answer to [How to read records terminated by custom separator from file in python?](http://stackoverflow.com/q/19600475/222914) — Janne Karila, Feb 11 '14 at 12:50

score 2 · Accepted Answer · answered Feb 11 '14 at 12:17

2

This is the pythonic way to process a file line-by-line:

with open(...) as fh:
    for line in fh:
        pass

This will take care of opening and closing the file, including if an exception is raised in the inner block, plus it treats the file object fh as an iterable, which automatically uses buffered I/O and manages memory so you don't have to worry about large files.

answered Feb 11 '14 at 12:17

Michael Foukarakis

39,737
6
87
123

What if all the words are on a single line? – Jayanth Koushik Feb 11 '14 at 12:18
It should be trivial to either: a) convert it to one-word-per-line via your shell or b) read from a file in chunks (ie. manually manage memory) and process accordingly. – Michael Foukarakis Feb 11 '14 at 12:19
@MichaelFoukarakis errors is at usr/lib/python2.7/codecs.py", line 296, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) MemoryError – Feb 11 '14 at 12:28

score -2 · Answer 2 · answered Feb 11 '14 at 12:16

-2

How about readline instead of read()

http://docs.python.org/2/tutorial/inputoutput.html

answered Feb 11 '14 at 12:16

user2814648

421
4
12

How to manage memory error in python?

2 Answers2