17

According to the BytesIO docs:

getbuffer()

Return a readable and writable view over the contents of the buffer without copying them. Also, mutating the view will transparently update the contents of the buffer:

getvalue()

Return bytes containing the entire contents of the buffer.

So it seems as if getbuffer is more complicated. But if you don't need a writable view? Would you then simply use getvalue? What are the trade-offs?

Minimal Example

In this example, it seems as if they do exactly the same:

# Create an example
from io import BytesIO
bytesio_object = BytesIO(b"Hello World!")

# Write the stuff
with open("output.txt", "wb") as f:
    f.write(bytesio_object.getbuffer())
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 1
    From the cpython code https://github.com/python/cpython/blob/master/Modules/_io/bytesio.c here, it seems when you call getvalue will resize the content buffer to strip off extra buffer space if allocated and getbuffer will return the buffer as it is ready to be modified if needed. – Kris Apr 30 '21 at 13:32
  • 1
    @Kris `getvalue`seems to be faster than `getbuffer`. How can it be if getvalue makes a copy everytime ? plus, unlike `getbuffer`, `getvalue`always points to the same underlying object – NicoAdrian Dec 28 '21 at 12:34

2 Answers2

2

This question is old, but it looks like nobody has answered this sufficiently.

Simply:

  • obj.getbuffer() creates a memoryview object.
  • Every time you write, or if there is a memoryview present, obj.getvalue() will need to create a new, complete value.
  • If you have not written and there is no memoryview present, obj.getvalue() is the fastest method of access, and requires no copies.

That being the case:

  • When creating another io.BytesIO, use obj.getvalue()
  • For random-access reading and writing, DEFINITELY use obj.getbuffer()
  • Avoid interpolating reading and writing frequently. If you must, then then DEFINITELY use obj.getbuffer(), unless your file is tiny.
  • Avoid using obj.getvalue() while a buffer is laying around.

Here, we see that it's all fast, and all well and good if no buffer is laying around:


# time getvalue()
>>> i = io.BytesIO(b'f' * 1_000_000)
>>> %timeit i.getvalue()
34.6 ns ± 0.178 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

# time getbuffer()
>>> %timeit i.getbuffer()
118 ns ± 0.495 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

# time getbuffer() and getvalue() together
>>> %timeit i.getbuffer(); i.getvalue()
173 ns ± 0.829 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Everything is fine, and working about like you'd expect. But let's see what happens when there's a buffer just laying around:

>>> x = i.getbuffer()
>>> %timeit i.getvalue()
33 µs ± 675 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Notice that we're no longer measuring in nanoseconds, we're measuring in microseconds. If you del x, we're back to being fast. This is all because while a memoryview exists, Python has to account for the possibility that the BytesIO has been written to. So, to give a definite state to the user, it copies the buffer.

Mr. B
  • 2,536
  • 1
  • 26
  • 26
1

Using getbuffer() is better, because, if you have really BIG data, copying them may take a long time. And (from PEP 20):

Explicit is better than implicit.
But value is undefined - it may be str or bytes. Buffer is always bytes.
Vad Sim
  • 266
  • 8
  • 21
  • `getvalue`seems to be faster than `getbuffer`. How can it be if getvalue makes a copy everytime ? plus, unlike `getbuffer`, `getvalue`always points to the same underlying object – NicoAdrian Dec 28 '21 at 12:34