1

I am having some issues with the below code:

with open(".../file.txt", encoding="utf-8", errors="ignore") as f:
    data = f.read()

I have a particular txt file of size 2.5GB and am trying to read it into my 16GB of RAM. However, after only a second or two, Python seems to be using 100% of my RAM and I get a memory error. The exact same code works as intended for other files. What can I do to investigate this?

EDIT

Actually, it appears as if Windows is lying about its file size... The file's proprties tab indicates 2.5GB but when loading it into the WordPad text editor, it was also progressively reading data into memory and beyonf the size indicated by Windows. Any thoughts?

Harry Stuart
  • 1,781
  • 2
  • 24
  • 39
  • 2
    SO already has a few answers on how to "chunk" files when opening them, e.g. [here](https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python), [here](https://stackoverflow.com/questions/45201013/read-a-file-in-byte-chunks-using-python) – Michael Kolber Jul 25 '19 at 02:43
  • So should I expect to use 6x-8x more RAM than the file size? is Python innefficient at reading these large files? Thanks for the links. – Harry Stuart Jul 25 '19 at 02:45
  • I wouldn't know, but that doesn't sound correct. That's a _lot_ of overhead. – Michael Kolber Jul 25 '19 at 03:00
  • I am going to "chunk" anyway as its reliable and good practice. I am still curious about my issue though. – Harry Stuart Jul 25 '19 at 03:02

0 Answers0