This question is old, but it looks like nobody has answered this sufficiently.
Simply:
obj.getbuffer()
creates a memoryview
object.
- Every time you write, or if there is a
memoryview
present, obj.getvalue()
will need to create a new, complete value.
- If you have not written and there is no
memoryview
present, obj.getvalue()
is the fastest method of access, and requires no copies.
That being the case:
- When creating another
io.BytesIO
, use obj.getvalue()
- For random-access reading and writing, DEFINITELY use
obj.getbuffer()
- Avoid interpolating reading and writing frequently. If you must, then then DEFINITELY use
obj.getbuffer()
, unless your file is tiny.
- Avoid using
obj.getvalue()
while a buffer is laying around.
Here, we see that it's all fast, and all well and good if no buffer is laying around:
# time getvalue()
>>> i = io.BytesIO(b'f' * 1_000_000)
>>> %timeit i.getvalue()
34.6 ns ± 0.178 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# time getbuffer()
>>> %timeit i.getbuffer()
118 ns ± 0.495 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# time getbuffer() and getvalue() together
>>> %timeit i.getbuffer(); i.getvalue()
173 ns ± 0.829 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Everything is fine, and working about like you'd expect. But let's see what happens when there's a buffer just laying around:
>>> x = i.getbuffer()
>>> %timeit i.getvalue()
33 µs ± 675 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Notice that we're no longer measuring in nanoseconds, we're measuring in microseconds. If you del x
, we're back to being fast. This is all because while a memoryview
exists, Python has to account for the possibility that the BytesIO
has been written to. So, to give a definite state to the user, it copies the buffer.