11

I'm running into a problem while trying to load large files using Python 3.5. Using read() with no arguments sometimes gave an OSError: Invalid argument. I then tried reading only part of the file and it seemed to work fine. I've determined that it starts to fail somewhere around 2.2GB, below is the example code:

>>> sys.version
'3.5.1 (v3.5.1:37a07cee5969, Dec  5 2015, 21:12:44) \n[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]'
>>> x = open('/Users/username/Desktop/large.txt', 'r').read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.1*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.2*10**9))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument

I also noticed that this does not happen in Python 2.7. Here is the same code run in Python 2.7:

>>> sys.version
'2.7.10 (default, Aug 22 2015, 20:33:39) \n[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.1)]'
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.1*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.2*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read()
>>>

I am using OS X El Capitan 10.11.1.

Is this a bug or should use another method for reading the files?

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
calico_
  • 1,171
  • 12
  • 23
  • I guess the file loading capacity is mainly dependent on How much free memory your device is having, at that point of time. – ZdaR Dec 24 '16 at 17:26
  • I thought that might be a problem but: A) I have 16GB and only 8 are used right now. B) it works fine if i switch to python2.7 – calico_ Dec 24 '16 at 17:27
  • From what is shown in the first snippet of output, it looks like the `>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.1*10**9))` worked since no `OSError` was raised when it was executed. The varying results may also be due to the fact that two different compilers were used to build the Python interpreter. See [_Difference between LLVM, GCC 4.2 and Apple LLVM compiler 3.1_](http://stackoverflow.com/questions/12020349/difference-between-llvm-gcc-4-2-and-apple-llvm-compiler-3-1). – martineau Dec 24 '16 at 17:40
  • Yes, 2.1*10**9 works. I tried several values but noticed that it starts to fail somewhere in between 2.1 and 2.2. – calico_ Dec 24 '16 at 17:42

1 Answers1

6

Yes, you have bumped into a bug.

Good news is that someone else has also found it and already created an issue for it in the Python bug tracker, see: Issue24658 - open().write() fails on 2 GB+ data (OS X). This, seems, is platform depended (OS-X only) and is reproducible when using read and/or write. Apparently an issue exists with the way fread.c is implemented in the libc implementation for OS-X see here.

Bad News is that it is still open (and, currently, inactive) so, you'll have to wait until it is resolved. Either way, you can still take a look at the discussion there if you're interested for the specifics.


As a solution, I'm pretty sure you can side-step the issue until it is fixed by reading in chunks and chaining the chunks during processing. Do the same when writing. Unfortunate but, it might do the trick.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253