1

Python's memoryview is said to not copy the data upon slicing. Many benchmarks were done, some on stackoverflow, "prooving" this behavior.

While attempting to mess with them, I encountered a weird behavior that I couldn't explain:

>>> arr = bytearray(range(0,15))
>>> mem = memoryview(arr)
>>> mem[5:15] = mem[0:10]
>>> arr
bytearray(b'\x00\x01\x02\x03\x04\x00\x01\x02\x03\x04\x05\x06\x07\x08\t')

On one hand, memoryview "does not" copy the data. On the other hand, this perfectly works!

While I was happy that it "worked", I got saddened by the fact that it works. Well... because it shouldn't.

If Python had a 1 character buffer, the result should have been this:

bytearray(b'\x00\x01\x02\x03\x04\x00\x01\x02\x03\x04\x00\x01\x02\x03\x04')

Basically, upon writing the 5th character, it should have overlapped and read the 1st character that was written earlier. An example for this naive approach:

>>> for i in range(10):
...    m[i+5] = m[i]
>>> a
bytearray(b'\x00\x01\x02\x03\x04\x00\x01\x02\x03\x04\x00\x01\x02\x03\x04')

I tried increasing the memoryview size to large amounts, but it still works, meaning python copies the data in the background, rendering memoryview objects quite pointless.

Is there anywhere I'm wrong in here? Any explanation? How does memoryview work then?

Bharel
  • 23,672
  • 5
  • 40
  • 80

1 Answers1

2

It checks for that:

    if (dptr + size < sptr || sptr + size < dptr)
        memcpy(dptr, sptr, size); /* no overlapping */
    else
        memmove(dptr, sptr, size);

memmove is specified to be safe for overlapping source and destination. How it ensures safety varies from case to case and implementation to implementation, but one technique is to work from right to left instead of left to right if left to right would overwrite not-yet-copied data.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Boy you are fast. It raises another question though: Why not always use memmove? memmove already makes the check regarding overlap so why check twice? Isn't it less efficient to check twice? I believe memmove will resort to memcopy if it's not overlapping. Perhaps it is because of a new stack frame overhead? – Bharel Aug 31 '17 at 23:13
  • 2
    @Bharel: Dunno. The check they make is even undefined behavior if `dptr` and `sptr` don't point into the same array, so it really seems like they should have left the check to `memmove`, which can perform the check without UB. Maybe they measured an actual performance difference on some implementation, or maybe they just didn't trust memmove. Could be that some compiler inlines memcpy but not memmove. – user2357112 Aug 31 '17 at 23:16
  • If they don't point to the same array it's out of the size bounds (which are checked earlier) so it'll resort to memcpy. – Bharel Aug 31 '17 at 23:19
  • Last question: How did you arrive to this answer so fast? What steps have you taken? ("Give a Man a Fish, and You Feed Him for a Day. Teach a Man To Fish, and You Feed Him for a Lifetime.") – Bharel Aug 31 '17 at 23:23
  • 3
    @Bharel: The CPython source code is in the [official Github repo](https://github.com/python/cpython). Most built-in object types are implemented under the [`Objects`](https://github.com/python/cpython/tree/master/Objects) directory, and of the files in there, [`memoryobject.c`](https://github.com/python/cpython/blob/master/Objects/memoryobject.c) stands out as the one that sounds like it implements `memoryview`... – user2357112 Aug 31 '17 at 23:28
  • 1
    From there, familiarity with the C API and the associated naming and commenting conventions points to [`memory_ass_sub`](https://github.com/python/cpython/blob/master/Objects/memoryobject.c#L2415) as the function that implements memoryview slice assignment (and other subscript assignment), and from there, it's a matter of tracing the code path down to the point where the data copying occurs. – user2357112 Aug 31 '17 at 23:31
  • You are absolutely amazing. Thank you! – Bharel Aug 31 '17 at 23:39