This is the code I have, to count the frequency
import collections
import codecs
import io
from collections import Counter
with io.open('Combine.txt', 'r', encoding='utf8') as infh:
words =infh.read().split()
with open('Counts2.txt', 'wb') as f:
for word, count in Counter(words).most_common(100000000):
f.write(u'{} {}\n'.format(word, count).encode('utf-8'))
When I try to read a big file( 4 GB) I am getting error
Traceback (most recent call last):
File "counter.py", line 7, in <module>
words =infh.read().split()
File "/usr/lib/python2.7/codecs.py", line 296, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError
I am using Ubuntu 12.4, 8 GB RAM Intel Core i7 How to fix this error ? /
usr/lib/python2.7/codecs.py", line 296, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError