3

I am trying to lowercase text in a text file. The text file is about 7 GB and I am trying to go through it line by line and lowercase all words. But I get a memory error and I don't understand why because I don't load the 7GB file at once.

My code:

import os
directory = '.'

for filename in os.listdir(directory):  # iterate through text files
    if filename.endswith(".txt"):
        with open(filename, "r", encoding='utf8') as r: # open file in read mode
             with open("./data/finalData.txt", "w", encoding='utf8') as w: # open the destination file where I want to write the lowercase text data
                for line in r:
                    line = line.lower()      
                    w.writelines(line)
print("Lowercasing done!")

What I get is:

MemoryError                               Traceback (most recent call last)
<ipython-input-5-282ad07ebddc> in <module>
      5         with open(filename, "r", encoding='utf8') as r:
      6              with open("./data/finalData.txt", "w", encoding='utf8') as w:
----> 7                 for line in r:
      8                     line = line.lower()
      9                     w.writelines(line)

MemoryError:
Georgy
  • 12,464
  • 7
  • 65
  • 73
Maria
  • 79
  • 5
  • 1
    Can it be that you have one (or several) very long lines so that one line does not fit into memory? – Yevhen Kuzmovych Apr 29 '19 at 09:37
  • that same code works for me (tried with a small sample file) – avloss Apr 29 '19 at 09:38
  • It works just fine. You should check the content in the file. – amanb Apr 29 '19 at 09:41
  • 1
    Since, its a large text file, you may read it chunk by chunk and then process the contents. This may be helpful: https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python – amanb Apr 29 '19 at 09:44

0 Answers0