Splitting large file into smaller file giving memory error

Question

This is the python code i'm using. I have a 5gb file which i need to split in around 10-12 files according to line numbers. But this code gives a memory error. Please can someone tell me what is wrong with this code?

from itertools import izip_longest

def grouper(n, iterable, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

n = 386972

with open('reviewsNew.txt','rb') as f:
    for i, g in enumerate(grouper(n, f, fillvalue=''), 1):
        with open('small_file_{0}'.format(i * n), 'w') as fout:
            fout.writelines(g)

http://stackoverflow.com/questions/6475328/read-large-text-files-in-python-line-by-line-without-loading-it-in-to-memory Similar question — be_good_do_good, Jul 24 '16 at 18:16
@be_good_do_good it's not the same. In my code after 3-4 iterations , i am getting a memory error. I dont know why. I am reading the file line by line :( — Kanika Rawat, Jul 24 '16 at 18:23
maybe not the issue but you are opening the input file as binary but you are not saving the same way. — Mikael Rousson, Jul 24 '16 at 18:42
@MikaelRousson when i open my file in text , only 139 lines are read and it stops. But when i open it in binary the whole file gets read atleast :P Is it a problem to open it in binary and saving it as a text file ? — Kanika Rawat, Jul 25 '16 at 04:55

score 0 · Answer 1 · answered Jul 24 '16 at 18:44

0

Just use groupby, so you don't need to create 386972 iterators:

from itertools import groupby

n = 386972
with open('reviewsNew.txt','rb') as f:
    for idx, lines in groupby(enumerate(iterable), lambda (idx, _): idx // n):
        with open('small_file_{0}'.format(idx * n), 'wb') as fout:
            fout.writelines(l for _, l in lines)

answered Jul 24 '16 at 18:44

Daniel

42,087
4
55
81

this also gave the memory error. I have a 4GB ram and the file i am trying to split is around 6GB. Does this matter? I am reading the file line by line so it isn't storing the whole file in memory so it should work right? – Kanika Rawat Jul 25 '16 at 05:05
Do you have a very long line (>4GB) in your file? – Daniel Jul 25 '16 at 07:44
I dont know. How can i check that? – Kanika Rawat Jul 26 '16 at 11:11

Splitting large file into smaller file giving memory error

1 Answers1

Linked