12

I opened an 8 MB file in Python, because I wanted to batch change various types of file names. I went through and loaded the file into a string and used the string method replace to replace everything. I then noticed that only half of the file was being replaced; as if Python wasn't fully opening the file.

Is there some kind of string size limit or max file size limit that I must play within the bounds of in Python?

Refer to the code in Python search and replace not replacing properly.

I have changed to the suggested code. The buffer is an 8 MB HTML file that is over 150k lines. The replacement code works perfectly; it's just that it's not replacing everything. Or for example one error that is a pain is:

When I'm attempting to replace the string ff10 to FF-10, it'll be changed to FF-010.

Community
  • 1
  • 1
nobody
  • 123
  • 1
  • 1
  • 5
  • 6
    You can open a file with any size, but when you read the whole file, MemoryOverflow can occur as 32Bit system can only allocate 2GB per process or you might have not enough memory. – Niklas R Aug 20 '11 at 20:08
  • 3
    Show the code that's giving you the problem, that way you can get a more useful answer than one that simply tells you whether your guess is right or not. :) – Rosh Oxymoron Aug 20 '11 at 20:10
  • Your code is buggy. The case x==1 will always match first, so you end up with FF-010. Use proper string replacement functions or read up on regexps and/or longest prefix match. – Arne Aug 20 '11 at 21:12
  • Are you using Windows? Are you opening the file in binary mode? If not, try to … – Gandaro Feb 26 '12 at 22:46

1 Answers1

21

No, there is no reachable maximum on the size of a file Python can open. 8 MB is tiny in modern terms. You made a mistake somewhere.

People regularly load gigabytes of data into memory. Depending on your computer's RAM, whether it's 64- or 32- bit OS and processor, the practical maximum for you may be anywhere from 1 GB up before you get a MemoryError.

As a test, I just loaded a 350 MB file into a string. It took only a few seconds. I then wrote it back out to a file. That took a little longer. I then hashed the file. The two are identical.

Python has no problems with large strings, until you hit the limit of your RAM, operating system, or processor.

You say you "went through and loaded the file into a string" -- that sounds like the first place you could have made a mistake. To load a file into a string, you just do fileobject.read(). If you did it some other way, that could be the problem.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
agf
  • 171,228
  • 44
  • 289
  • 238
  • @nobody see my comment on youe question – Niklas R Aug 20 '11 at 20:10
  • I did a test and added the results to my answer. – agf Aug 20 '11 at 20:15
  • @Niklas depending on your computer, you can get a `MemoryError` at sizes smaller than 2gb, as I mentioned. – agf Aug 20 '11 at 20:16
  • @Peter Trivial edits are discouraged. I appreciate it when people correct errors, but the change you made didn't affect anyone's understanding of the question. – agf Apr 15 '12 at 22:51