Python undefined behavior regarding bytes and bytearrays

Question

I am facing some problems with Python3 Bytes/Bytearrays, and I am quite sure that something is not quite right. I tried contacting Microsoft, but they are taking too long to answer, and the problem is hampering my work. Currently using Microsoft Visual Studio Community 2022 (64-bit) Version 17.2.6.

If a bytearray object is passed to a function from a parent function, and the local object is changed, the object in the parent function also changes. This behavior is incorrect as complete isolation between functions is the norm - supported by the fact that there is memory isolation between variables of functions.

The images attached will illustrate the full picture, any help will be appreciated. Also, Python is a hobby language for me, so if there is incorrect knowledge of the language from my side, please pardon!

Not supposed to happen Not supposed to compile (because of the variable type of "key" which is bytes) Supposed to happen Not supposed to happen

Code is attached below just in case:

    k = bytearray(b"0123")
    
    def f1(key: bytearray):
        key[0] = 48 + 4
        return
    
    def f2(key: bytes):
        key[0] = 48 + 4
        return
    
    def f3(key: bytearray):
        for i in key:
            i = 0
        return
    
    def f4(key: bytearray):
        for i in range(len(key)):
            key[i] = 0
        return

    print(k)
    f4(k)
    print(k)

`bytearray` objects are passed by reference. Passing them into functions doesn't copy them. So inside the function, you're operating on the original object, not a copy of it. In short...the output you're getting is exactly what you should get. The problem here is with your understanding and expectations, not your results. — CryptoFool, Oct 09 '22 at 00:38
*"I tried contacting Microsoft"* - why? What does this have to do with Microsoft? — kaya3, Oct 09 '22 at 02:10

Ian Moote · Answer 1 · 2022-10-09T02:11:19.580

Regarding the "not supposed to happen" in the line def f2(key: bytes) -- that is a misconception. That key: bytes syntax is called an "annotation", as is the -> return_type syntax. It's there as a notational prompt for the programmer and is not enforced by the interpreter. IIRC the idea was that third-party tools could use it to autocheck the code.

This behavior is incorrect as complete isolation between functions is the norm - supported by the fact that there is memory isolation between variables of functions.

That is not correct. There is some scoping of variables that goes on when you make a function call, but not when you pass the variable as part of the function call. C pushes data values onto the stack for use by its functions, but Python passes the references to its data objects -- the equivalent of a pointer in C.

The "problem" with your code is that you're not passing bytesarray data to your function, you're passing a reference to the bytesarray object. This can be demonstrated with some minor changes to your code:

k = bytearray(b"0123")

def f1(key: bytearray):
    key[0] = 48 + 4
    return

def f2(key: bytes):
    key[0] = 48 + 4
    return

def f3(key: bytearray):
    for i in key:
        i = 0
    return

def f4(key: bytearray):
    for i in range(len(key)):
        key[i] = 0
    return

def whatsmyobjectid(key):
    print(key, id(key))

print(k, id(k))
whatsmyobjectid(k)
f4(k)
print(k, id(k))

The output is:

bytearray(b'0123') 140407680541424
bytearray(b'0123') 140407680541424
bytearray(b'\x00\x00\x00\x00') 140407680541424

As you can see, the object ID of key inside the function is the same as k before making the call and after making the call.

If you only want the data passed to the function and not a reference to the object, you'd have to do something like this:

f4(bytesarray(k))

This can be illustrated by changing the bottom of your code to:

print(k, id(k))
whatsmyobjectid(k)
f4(bytearray(k))
print(k, id(k))
f4(k)
print(k, id(k))

The output is:

bytearray(b'0123') 140046806483824
bytearray(b'0123') 140046806483824
bytearray(b'0123') 140046806483824
bytearray(b'\x00\x00\x00\x00') 140046806483824

As you can see, if you pass the variable using f4(bytearray(k)) then the original bytesarray object k is left unaltered, but when you pass just k it's changed by the function.

I should probably address your f3() function:

def f3(key: bytearray):
    for i in key:
        i = 0
    return

Let's analyze that for a moment:

So you're taking in the bytesarray object key. for i in key returns the decimal value of each byte in that array in i, yes, but then you're just changing i to 0 before having the next value of key fetched back into i -- you're not doing anything with it. If you were to change that i = 0 to print(i) you'd see that the output is:

which you already know are the decimal ASCII digits for zero through three.

I think what you were intending to do in f3() was to change each byte in the bytesarray to zero. If you were to change that function to:

def f3(key: bytearray):
    for i in range(len(key)):
        key[i] = 0
    return

you'll see that it does, indeed, have the same effect as f4().

Just to further demonstrate the scoping of variables:

x = 1
print(x)
def xis17():
    x = 17
    print(x)
xis17()
print(x)

Output is:

1
17
1

You can see that you can reuse x inside of the function without affecting the parent's x.

Scoping sometimes does get hinkily inconsistent to some degree in Python. For example, this is legal:

x = 1
print(x)
def xis17():
    print(x)
xis17()
print(x)

and will print the value of the parent's x, but the following will throw an UnboundLocalError exception -- on the print(), not the increment:

x = 1
print(x)
def xis17():
    print(x)
    x+=1
xis17()
print(x)

In regards to comparisons to C, C arrays are also passed by reference (technically by pointer) and modifying it in the called function will also change the “original”, same with Java and many other languages. Arrays are rarely passed by value. — Max, Oct 09 '22 at 02:10
Thanks Ian for the detailed answer! Really appreciate it! I just had one small follow up question: "If you only want the data passed to the function and not a reference to the object, you'd have to do something like this: f4(bytesarray(k)) ". I could use .copy() as well, right, to get the same effect? — Neev Penkar, Oct 09 '22 at 16:24
Yes, you can pass a shallow copy of the object to your function. In fact, that would make your code more readable than the example I gave. — Ian Moote, Oct 09 '22 at 18:05

Python undefined behavior regarding bytes and bytearrays

1 Answers1