1

Input: large file around 12GB with sequence file, with ~ delimiter and I want to break after every 10th occurrence with new line.

I tried with

with open ("file.txt") as f:
    for line in f:
        x = line.count("~")
        y = line.split("~")
        s = ['Ç'.join(x) for x in [y[i:i + 10] for i in xrange(0, len(y), 10)]]
with open ("output.txt","w") as outfile:
    outfile.write("~\n".join(s))

While line.split('~') I'am getting memory error.

I tried with y = [line.split('~') for line in f] but no use same error. Please assist me how to handle this issue.

mortalis
  • 2,060
  • 24
  • 34
akhil
  • 15
  • 1

1 Answers1

1
for line in f:

will try to load all file into your RAM

Use xreadline iterator to load file line by line:

for line in f.xreadlines():
Yevgeniy Shchemelev
  • 3,601
  • 2
  • 32
  • 39
  • after changing to for line in f.xreadlines(): still am facing memory error for this line y = line.split("~"). – akhil Jul 10 '17 at 07:47
  • Check the size of a line. If you don't have proper end of line symbols in your file then all file will be read into one line. In that case you have to read your file as binary using a buffer. See example in following question: https://stackoverflow.com/questions/1035340/reading-binary-file-and-looping-over-each-byte – Yevgeniy Shchemelev Jul 10 '17 at 07:51
  • i tried with binary as well but no use again same memory error – akhil Jul 10 '17 at 08:59
  • outfile.write("~\n".join(s)) will also load everything into the memory try to use: for item in s: outfile.write(item+'\n') – Yevgeniy Shchemelev Jul 10 '17 at 11:24
  • http://www.diveintopython.net/file_handling/file_objects.html could be useful for understanding of file opertions – Yevgeniy Shchemelev Jul 10 '17 at 11:28
  • my input file is in sequence format and no end at the line, still am facing memory error issue – akhil Jul 10 '17 at 13:14