Read a large text file and write to another file with Python

Question

I am trying to convert a large text file (size of 5 gig+) but got a From this post, I managed to convert encoding format of a text file into a format that is readable with this:

path ='path/to/file'
des_path = 'path/to/store/file'
for filename in os.listdir(path):
    with open('{}/{}'.format(path, filename), 'r+', encoding='iso-8859-11') as f:
            t = open('{}/{}'.format(des_path, filename), 'w')
            string = f.read()
            t.write(string)
            t.close()

The problem here is that when I tried to convert a text file with a large size(5 GB+). I will got this error

Traceback (most recent call last):
  File "Desktop/convertfile.py", line 12, in <module>
    string = f.read()
  File "/usr/lib/python3.6/encodings/iso8859_11.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
MemoryError

which I know that it cannot read a file with this large. And I found from several link that I can do it by reading line by line.

So, how can I apply to the code I have to make it read line by line? What I understand about reading line by line here is that I need to read a line from f and add it to t until end of the line, right?

`tt = f.read()` (from your traceback) is not in the posted code — FlyingTeller, Jul 09 '19 at 14:05
Are you missing some code here? Where is `line` defined before your `string = line.read()`? Also your traceback references something that isn't in the given code. — Engineero, Jul 09 '19 at 14:05
oh wait I wrote the wrong one. I just changed a variable name lol. Edited. — emp, Jul 09 '19 at 14:06
Maybe using a lazy method. Read this: https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python — Nache, Jul 09 '19 at 14:10
@Nache Oh so basically run a for loop of a function and inside the loop, append each piece to the output file, right? — emp, Jul 09 '19 at 14:43
@Jamiewp Right. It saves your ram because it stores only one piece of data in each loop iteration. — Nache, Jul 09 '19 at 19:38

score 1 · Answer 1 · answered Jul 09 '19 at 14:12

You can iterate on the lines of an open file.

for filename in os.listdir(path):
    inp, out = open_files(filename):
    for line in inp: 
        out.write(line)
    inp.close(), out.close()

Note that I've hidden the complexity of the different paths, encodings, modes in a function that I suggest you to actually write...

Re buffering, i.e. reading/writing larger chunks of the text, Python does its own buffering undercover so this shouldn't be too slow with respect to a more complex solution.

Read a large text file and write to another file with Python

1 Answers1