8

I've recently meet the readinto method of file object (in Python 2.7), it is similar to fread in C. It seems to be convenient and powerful in some case. I plan to use it to read several files into one pre-allocated numpy array without data copied.

e.g.

a = np.empty(N)
b = memoryview(a)
fp1.readinto(b[0:100])
fp2.readinto(b[100:200])

and

fp1.readinto(b[0:100])
fp1.seek(400, 1)
fp1.readinto(b[100:200])

I've used Cython and fread to do this before I met readinto. So I'm very happy to know a pure python solution.

However its document string says,

file.readinto?
Type:        method_descriptor
String form: <method 'readinto' of 'file' objects>
Namespace:   Python builtin
Docstring:   readinto() -> Undocumented.  Don't use this; it may go away.

Don't use this? What happend?

So I'm confused, should I use readinto or not? It may cause any unwanted problem?

Is there any alternative implementation for the code above without readinto but also avoid data copy? (To avoid copy means np.concatenate or np.stack is not a good choice.)

Any sugguestion is welcome! Thank you.

-------upate-------

It seems that I can use io.FileIO in standard library instead of the build-in function open. It looks OK so I've post it as an answer.

Any comment or other solution is still welcome!

-------upate-------

If you meet the same problem, you may want to have a look at the comments below by
Andrea Corbellini and Padraic Cunningham.

Syrtis Major
  • 3,791
  • 1
  • 30
  • 40
  • Might answer your question http://stackoverflow.com/questions/9791780/readinto-replacement – Eoin Murray Jan 13 '16 at 14:57
  • @AndreaCorbellini Just typing `file.readinto?` in IPython, this is equivalent to `help(file.readinto)` in the standard python shell. I've checked other `file` methods as you said, but only `readinto` shows this message. – Syrtis Major Jan 13 '16 at 16:15
  • 1
    Do you maybe want memmap?http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.memmap.html – Padraic Cunningham Jan 13 '16 at 16:19
  • For the reference, the "it may go away" message has been there since 2001: https://hg.python.org/cpython/rev/646e24728852 – Andrea Corbellini Jan 13 '16 at 16:21
  • @PadraicCunningham Yes, `memmap` is great and works well in many cases. But if I want to read data from several file or from several discontinuous part of one file into an array, it seems impossible for `memmap`? Ah, wait, creating a `memmap` won't read the data into memory, until I use it, right? So I can do it like this `a[0:100]=mmap1[some_slice]; a[100:200]=mmap2[some_slice]` and also avoid data copy. – Syrtis Major Jan 13 '16 at 16:28
  • @PadraicCunningham I think that's a good alternative solution. Thank you. – Syrtis Major Jan 13 '16 at 16:30
  • @AndreaCorbellini Thank you for the infomation. So it seems people finally accept it in the `io` module :) – Syrtis Major Jan 13 '16 at 16:36
  • @SyrtisMajor, yes exactly, it does not read the whole file into memory but I think you do need a block of free contiguous memory big enough for the file object. Python also has it's own builtin mmap https://docs.python.org/2/library/mmap.html – Padraic Cunningham Jan 13 '16 at 16:38
  • 1
    @SyrtisMajor: well, actually it has always been accepted, even in the builtin `file` object. That message comes from a Python that did not have `bytearray`s as we have today. – Andrea Corbellini Jan 13 '16 at 16:39
  • @AndreaCorbellini But it disappear in Python 3.5. The object created by `open` has no `readinto` method. – Syrtis Major Jan 13 '16 at 16:44
  • 1
    @SyrtisMajor: in Python 3, `open()` opens file in text mode by default. `readinto()` is a feature of binary files. Try `open(..., 'rb')` – Andrea Corbellini Jan 13 '16 at 16:50
  • @AndreaCorbellini Yes, you're right. I do feel that in Python 3 things are more self-consistent and better organized. So it implies that `readinto` has indeed been accepted, even in the builtin file object. – Syrtis Major Jan 13 '16 at 16:56
  • @AndreaCorbellini @PadraicCunningham: I think you may post your argument as an answer. Both are good and helpful ;) And I shall draw back mine as it seems to be safe to use `open` directly. – Syrtis Major Jan 13 '16 at 17:20
  • @PadraicCunningham BTW, is it possible to map two or more file into one `mmap` object? Then we needn't to read data explicitly. As far as I see, the answer is no. – Syrtis Major Jan 13 '16 at 17:24
  • @SyrtisMajor, what are you actually trying to do? – Padraic Cunningham Jan 13 '16 at 17:46
  • @PadraicCunningham There are several files. I want to read a part(slice) of each file into one array. This array is very big (several GB or more). In some case, only a small (but not continuous) fraction of the array would be accessed. These files are numerical simulation results. – Syrtis Major Jan 13 '16 at 18:31

1 Answers1

5

You may use io.FileIO in python standard library instead of the build-in function open or file, if you are not sure with file.readinto.

Here's the docstring:

#io.FileIO.readinto?
Type:        method_descriptor
String form: <method 'readinto' of '_io.FileIO' objects>
Docstring:   readinto() -> Same as RawIOBase.readinto().

The document of io.RawIOBase.readinto can be found here.

class io.RawIOBase

...

readinto(b)

Read up to len(b) bytes into bytearray b and return the number of bytes read. If the object is in non-blocking mode and no bytes are available, None is returned.

It's available in both Python 2 and 3.

Community
  • 1
  • 1
Syrtis Major
  • 3,791
  • 1
  • 30
  • 40