0

I having using python now for about a year and am fairly familiar with it. Though I am quite new to threads and am a little confused by what data threads share.

I have been reading through stuff online which all seem to agree that threads share the same memory space. Though in trying to demonstrate this to myself, it seems I have an incorrect understanding of how this sharing works.

I wrote a short script to just add one to a local variable three times. I create two threads at a time using the same function. I would have thought that due to a shared memory the X variable in one thread would also be increased while it sleeps due to another thread increasing its own X, and vice versa. So after the second loop of thread one where x=2 while thread two sleeps, I would have thought thread two would come out of its sleep with x = 2 and not x = 1. Though as the order of the printed statements suggests, the variables are not shared between the threads.

My question is if you have multiple threads running at once using the same function, will the variables in each thread be kept separate every time throughout the program running (provided no globals are defined)? And then what exactly does this shared memory mean?

Any guidance on the issue (or general threading advice) would be greatly appreciated.


import threading 
from time import sleep 

def increase(x):
    for i in range(3):
        print(f"[{threading.currentThread().getName()}] X is {x}")
        x += 1
        print(f"[{threading.currentThread().getName()}] X is now {x} after increase")
        sleep(0.5)
        print(f"[{threading.currentThread().getName()}] X is now {x} after sleep")
    return x 


def main():
    x = 0 
    first = threading.Thread(name="Thread One", target=increase,args=([x]))
    second = threading.Thread(name="Thread Two", target=increase,args=([x]))
    
    first.start()
    second.start()
    
    first.join()
    second.join()  

       
    
if __name__ == "__main__":
    main()

And the result is:


[Thread One] X is 0
[Thread One] X is now 1 after increase
[Thread Two] X is 0
[Thread Two] X is now 1 after increase
[Thread Two] X is now 1 after sleep[Thread One] X is now 1 after sleep
[Thread One] X is 1
[Thread One] X is now 2 after increase

[Thread Two] X is 1
[Thread Two] X is now 2 after increase
[Thread One] X is now 2 after sleep[Thread Two] X is now 2 after sleep
[Thread Two] X is 2
[Thread Two] X is now 3 after increase

[Thread One] X is 2
[Thread One] X is now 3 after increase
[Thread One] X is now 3 after sleep[Thread Two] X is now 3 after sleep
SonOfGib
  • 3
  • 4

3 Answers3

0

In your case, x is shared by a function parameter as a copy and not a reference. If you want to increase your counter you have to encapsulate it in a class.

For ex:

import threading 
from time import sleep 

class foo:
    x = 0

def increase(foo):
    for i in range(3):
        print(f"[{threading.currentThread().getName()}] X is {foo.x}")
        foo.x += 1
        print(f"[{threading.currentThread().getName()}] X is now {foo.x} after increase")
        sleep(0.5)
        print(f"[{threading.currentThread().getName()}] X is now {foo.x} after sleep")
    return foo.x 

def main():
    x = foo() 
    first = threading.Thread(name="Thread One", target=increase,args=([x]))
    second = threading.Thread(name="Thread Two", target=increase,args=([x]))
    
    first.start()
    second.start()
    
    first.join()
    second.join()  
    
if __name__ == "__main__":
    main()

Note: Python threads are kind of specific. You can have a look at this video https://www.youtube.com/watch?v=Obt-vMVdM8s

------------ Edit -------------

To be more precise. In your case, x is an int and so it is copied on each function call. Same behaviour whether it's a string or float.

You will have the same behaviour without thread:

def increase(x):
    for i in range(3):
        print(x)
        x += 1
    return x

x = 0

increase(x)
assert x == 0

x += 1

increase(x)
assert x == 1
Florian Vuillemot
  • 500
  • 1
  • 4
  • 10
  • Thank you for the response, I will definitely check out that video when I get a chance. I should have made myself more clear, its not that I want to share information between threads. It's the opposite. I want to make sure their variables are independent of each other even when using the same function. I kept the workings of the functions simple to not add additional confusion. I just wanted to focus on the threads and the increase function was to try show how they were working if that makes sense. – SonOfGib May 25 '22 at 10:43
  • I knew that the X in each function was a copy from the main, I just didn't know if each thread shared the same copy or if each had its own. – SonOfGib May 25 '22 at 10:46
0

You're correct in saying that memory is shared but the intricacies of it is deeper. What you're getting confused with is immutable vs mutable types. You can find out more here. Ive removed the for loop since it gets confusing:

import threading
from time import sleep


def increase(x):
    print(f"[{threading.currentThread().getName()}] address of x: {hex(id(x))} ")

    print(f"[{threading.currentThread().getName()}] X is {x}")
    x += 1
    print(f"[{threading.currentThread().getName()}] address of x after increment: {hex(id(x))} ")
    print(f"[{threading.currentThread().getName()}] X is now {x} after increase")
    sleep(0.5)
    print(f"[{threading.currentThread().getName()}] X is now {x} after sleep")
    print(f"[{threading.currentThread().getName()}] address of x after sleep: {hex(id(x))} ")
    return x

def main():
    x = 0
    first = threading.Thread(name="Thread One", target=increase, args=([x]))
    second = threading.Thread(name="Thread Two", target=increase, args=([x]))

    first.start()
    second.start()

    first.join()
    second.join()


if __name__ == "__main__":
    main()

What ive done here is printed the address of x in the threads. The output:

[Thread One] address of x: 0x7ffbbebb7c20 
[Thread One] X is 0
[Thread One] address of x after increment: 0x7ffbbebb7c40 
[Thread One] X is now 1 after increase
[Thread Two] address of x: 0x7ffbbebb7c20 
[Thread Two] X is 0
[Thread Two] address of x after increment: 0x7ffbbebb7c40 
[Thread Two] X is now 1 after increase
[Thread Two] X is now 1 after sleep[Thread One] X is now 1 after sleep
[Thread One] address of x after sleep: 0x7ffbbebb7c40 

[Thread Two] address of x after sleep: 0x7ffbbebb7c40 

You will notice that the first print line when im just reading x the address is 0x7ffbbebb7c20 after updating it thread 1 and 2 get a different addresses: 0x7ffbbebb7c40. Now they both get the same address cause python tries to keeg the memory footprint lower. You can find more about that here But for our purposes the function gets the same variable to read and once you try write or update that variable a copy of it is made for that thread. This only happens if you're working with a immutable type (int, string, instances etc.) if you pass a mutable type like dict:



import threading
from time import sleep


def increase(test_var):
    print(f"[{threading.currentThread().getName()}] Address of test_var: {hex(id(test_var))}")
    print(f"[{threading.currentThread().getName()}] Address of test_var['key']: {hex(id(test_var['key']))}")
    print(f"[{threading.currentThread().getName()}] test_var['key'] is {test_var['key']}")
    test_var['key'] += 1
    print(f"[{threading.currentThread().getName()}] test_var['key'] is now {test_var['key']} after increase")
    print(f"[{threading.currentThread().getName()}] Address of test_var after increment: {hex(id(test_var))}")
    print(f"[{threading.currentThread().getName()}] Address of test_var['key'] after increment: {hex(id(test_var['key']))}")
    sleep(0.5)
    print(f"[{threading.currentThread().getName()}] test_var['key'] is now {test_var['key']} after sleep")
    print(f"[{threading.currentThread().getName()}] Address of test_var after sleep: {hex(id(test_var))}")
    print(f"[{threading.currentThread().getName()}] Address of test_var['key'] after sleep: {hex(id(test_var['key']))}")
    return test_var

def main():
    test_var = {'key': 0}
    first = threading.Thread(name="Thread One", target=increase, args=([test_var]))
    second = threading.Thread(name="Thread Two", target=increase, args=([test_var]))

    first.start()
    second.start()

    first.join()
    second.join()


if __name__ == "__main__":
    main()

The output is what you expected:

[Thread One] Address of test_var: 0x22216509a98
[Thread One] Address of test_var['key']: 0x7ffbaf7d7c20
[Thread One] test_var['key'] is 0
[Thread One] test_var['key'] is now 1 after increase
[Thread One] Address of test_var after increment: 0x22216509a98
[Thread One] Address of test_var['key'] after increment: 0x7ffbaf7d7c40
[Thread Two] Address of test_var: 0x22216509a98
[Thread Two] Address of test_var['key']: 0x7ffbaf7d7c40
[Thread Two] test_var['key'] is 1
[Thread Two] test_var['key'] is now 2 after increase
[Thread Two] Address of test_var after increment: 0x22216509a98
[Thread Two] Address of test_var['key'] after increment: 0x7ffbaf7d7c60
[Thread Two] test_var['key'] is now 2 after sleep
[Thread Two] Address of test_var after sleep: 0x22216509a98
[Thread Two] Address of test_var['key'] after sleep: 0x7ffbaf7d7c60
[Thread One] test_var['key'] is now 2 after sleep
[Thread One] Address of test_var after sleep: 0x22216509a98
[Thread One] Address of test_var['key'] after sleep: 0x7ffbaf7d7c60

Notice how the address of test_var (0x22216509a98) doesn't change between the threads Because its mutable and can be shared across threads.

testfile
  • 2,145
  • 1
  • 12
  • 31
  • Thanks for that thorough explanation. Just to make it very clear, even though immutable objects in different threads share the same memory space, within each thread they can have their own value independent of the other threads? – SonOfGib May 25 '22 at 12:55
  • correct. Since the python interpreter copies the value when you try to change it. Only for immutable types. If you use mutables then the same object is updated across threads – testfile May 25 '22 at 13:06
  • @SonOfGib, Immutable objects cannot "have their own value independent of other threads." An immutable object, by definition, cannot change. Every thread that has a reference to the same immutable object must see the object in same state as every other thread sees. It is important to understand the difference between objects and variables. The `x` in your `increase(x)` function is not an object. It is a _variable_ that _refers_ to an `int` object. And the `test_var` in testfile's version is a variable that refers to a `dict`. When you wrote, `x+=1`, that changes `x` to refer to a new `int`,... – Solomon Slow May 25 '22 at 19:13
  • ...but when testfile wrote `test_var['key']+=1`, that doesn't change `test_var` at all. `test_var` still refers to the same `dict`, only now the `dict` itself has been changed. It's not because of which object was immutable and which object was not. It was because of what's on the left hand side of the assignment operator. When you write `x=...` you are telling Python to change `x`. But the left hand side of `test_var['key']=...` isn't `test_var`, it's `test_var['key']` That's a different location: It's a location _within_ the `dict` to which `test_var` refers. – Solomon Slow May 25 '22 at 19:19
0

The answer that you accepted does not directly answer this question:

if you have multiple threads running at once using the same function, will the variables in each thread be kept separate?

"Local" doesn't just mean local to this function, it means local to this function call.

The values of a function's arguments and local variables are stored in an activation record. Every time a function is called, a new activation record is created, and when the function returns, that activation record is destroyed.

It means, the x argument in your increase(x) function is a different variable in each call to the function. If a function calls itself recursively, then the args and locals are different variables in each recursive call, and if the function is called in multiple threads, then the args and locals are different variables in each of the threads.

I have been reading through stuff online which all seem to agree that threads share the same memory space.

Absolutely true, but an argument or a local is not a definite location in memory. A global is a definite location in memory. So, if you have some global g, every thread will agree that g has the same value. And, a Python object, so long as it exists, occupies a definite location, so every thread that has a reference to the same object will see it in the same state. But, a local variable occupies a different memory location in each activation of the function that declares it.

Locals and arguments are not shared. They aren't shared between recursive calls to the function, and they aren't shared by calls from different threads.

Solomon Slow
  • 25,130
  • 5
  • 37
  • 57