Python file.read using more RAM than file size

Asked Jul 25 '19 at 02:31

Active Jul 25 '19 at 02:37

Viewed 190 times

I am having some issues with the below code:

with open(".../file.txt", encoding="utf-8", errors="ignore") as f:
    data = f.read()

I have a particular txt file of size 2.5GB and am trying to read it into my 16GB of RAM. However, after only a second or two, Python seems to be using 100% of my RAM and I get a memory error. The exact same code works as intended for other files. What can I do to investigate this?

EDIT

Actually, it appears as if Windows is lying about its file size... The file's proprties tab indicates 2.5GB but when loading it into the WordPad text editor, it was also progressively reading data into memory and beyonf the size indicated by Windows. Any thoughts?

edited Jul 25 '19 at 02:37

asked Jul 25 '19 at 02:31

Harry Stuart

1,781
2
24
39

2

SO already has a few answers on how to "chunk" files when opening them, e.g. [here](https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python), [here](https://stackoverflow.com/questions/45201013/read-a-file-in-byte-chunks-using-python) – Michael Kolber Jul 25 '19 at 02:43
So should I expect to use 6x-8x more RAM than the file size? is Python innefficient at reading these large files? Thanks for the links. – Harry Stuart Jul 25 '19 at 02:45
I wouldn't know, but that doesn't sound correct. That's a _lot_ of overhead. – Michael Kolber Jul 25 '19 at 03:00
I am going to "chunk" anyway as its reliable and good practice. I am still curious about my issue though. – Harry Stuart Jul 25 '19 at 03:02

Python file.read using more RAM than file size

0 Answers0