0

I am trying to convert a large text file (size of 5 gig+) but got a From this post, I managed to convert encoding format of a text file into a format that is readable with this:

path ='path/to/file'
des_path = 'path/to/store/file'
for filename in os.listdir(path):
    with open('{}/{}'.format(path, filename), 'r+', encoding='iso-8859-11') as f:
            t = open('{}/{}'.format(des_path, filename), 'w')
            string = f.read()
            t.write(string)
            t.close()

The problem here is that when I tried to convert a text file with a large size(5 GB+). I will got this error

Traceback (most recent call last):
  File "Desktop/convertfile.py", line 12, in <module>
    string = f.read()
  File "/usr/lib/python3.6/encodings/iso8859_11.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
MemoryError

which I know that it cannot read a file with this large. And I found from several link that I can do it by reading line by line.

So, how can I apply to the code I have to make it read line by line? What I understand about reading line by line here is that I need to read a line from f and add it to t until end of the line, right?

emp
  • 602
  • 3
  • 11
  • 22
  • 1
    `tt = f.read()` (from your traceback) is not in the posted code – FlyingTeller Jul 09 '19 at 14:05
  • 1
    Are you missing some code here? Where is `line` defined before your `string = line.read()`? Also your traceback references something that isn't in the given code. – Engineero Jul 09 '19 at 14:05
  • oh wait I wrote the wrong one. I just changed a variable name lol. Edited. – emp Jul 09 '19 at 14:06
  • 1
    Maybe using a lazy method. Read this: https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python – Nache Jul 09 '19 at 14:10
  • @Nache Oh so basically run a for loop of a function and inside the loop, append each piece to the output file, right? – emp Jul 09 '19 at 14:43
  • @Jamiewp Right. It saves your ram because it stores only one piece of data in each loop iteration. – Nache Jul 09 '19 at 19:38

1 Answers1

1

You can iterate on the lines of an open file.

for filename in os.listdir(path):
    inp, out = open_files(filename):
    for line in inp: 
        out.write(line)
    inp.close(), out.close()

Note that I've hidden the complexity of the different paths, encodings, modes in a function that I suggest you to actually write...

Re buffering, i.e. reading/writing larger chunks of the text, Python does its own buffering undercover so this shouldn't be too slow with respect to a more complex solution.

gboffi
  • 22,939
  • 8
  • 54
  • 85