5

In python, I'm reading a large file, and I want to add each line(after some modifications) to an empty list. I want to do this to only the first few lines, so I did:

X = []
for line in range(3):

    i = file.readline()
    m = str(i)
    X.append(m)

However, an error shows up, and says there is a MemoryError for the line i = file.readline().

What should I do? It is the same even if I make the range 1 (although I don't know how that affects the line, since it's inside the loop).

How do I not get the error code? I'm iterating, and I can't make it into a binary file because the file isn't just integers - there's decimals and non-numerical characters.

The txt file is 5 gigs.

Any ideas?

Goku241
  • 81
  • 5
  • How big is this file? How much memory is in your computer? What does a line look like? – wkl Sep 06 '17 at 21:31
  • You don't have enough memory to read the line as a single string. Figure out how you can process the file a little at a time and not hold it all in memory. – Blender Sep 06 '17 at 21:31
  • 3
    What line ending convention does your file use? It would appear you are getting the error because `readline` isn't finding the expected line ending and reading the entire file into memory as a result. – chepner Sep 06 '17 at 21:31
  • @chepner If the file is opened with `open` in a non-binary mode, universal newlines should apply so it shouldn't matter which line ending is used. In any case, there is a really long line for the first line. – Artyer Sep 06 '17 at 21:40

1 Answers1

4

filehandle.readline() breaks lines via the newline character (\n) - if your file has gigantic lines, or no new lines at all, you'll need to figure out a different way of chunking it.

Normally you might read the file in chunks and process those chunks one by one.

Can you figure out how you might break up the file? Could you, for example, only read 1024 bytes at a time, and work with that chunk?

If not, it's often easier to clean up the format of the file instead of designing a complicated reader.

Danielle M.
  • 3,607
  • 1
  • 14
  • 31