5

My module has two functions in it: do_something(), and change_behavior().

The function do_something() does Thing A by default. After change_behavior() has been called, do_something() does Thing B instead.

I want this transition to be thread-specific. That is, any new thread will have Thing A happen when it calls do_something(), but if that thread calls change_behavior(), then Thing B will happen instead when it continues to call do_something().

Each thread should be independent, so that one thread calling change_behavior() does not affect the behavior of do_something() for other threads.


My instinctive solution to this is to have behavior tied to the thread's ID (assessed via threading.get_ident()). The function do_something() checks a local table for whether or not the thread's ID is in it, and adjusts its behavior accordingly. Meanwhile, the function change_behavior() simply adds the current thread to that registry. This works at any given time because there are never two concurrent threads with the same ID.

The problem comes in when the current set of threads joins, and time passes, and the parent thread makes a whole bunch more threads. One of the new threads has the same ID as one of the previous threads, because thread IDs are reused sometimes. That thread calls do_something(), and because it's already in the registry, it does Thing B instead of Thing A.

To fix this, I need to remove the thread ID from the registry somehow, between when the first thread with that ID ends and when the second thread with that ID starts. Some hypothetical ideas I've come up with:

  • Periodically check whether each thread ID is still active. This is problematic because it both wastes CPU resources and can miss if a thread is destroyed and then recreated between ticks
  • Attach a method hook to be called whenever the thread joins. I'm not sure how to do this, besides the next idea
  • As part of change_behavior(), hijack/replace the current thread's ._quit() method with one that first removes the thread's ID from the registry. This seems like bad practice, and potentially breaking.

Another aspect of my use case is that, if possible, I'd like new threads to inherit the current behavior of their parent threads, so that the user doesn't have to manually set every flag they create - but this is more relevant to how I store the information about the state of the tread than it is to when the thread finishes, which makes it marginally less relevant to this particular question.

I'm looking for guidance on whether any of these particular solutions are ideal, standard, or idiomatic, and whether there's an intended thing to do in this use case.


Using threading.local() was suggested in the comments by @TarunLalwani. I've investigated it, and it is useful, but it doesn't account for the other use case I'd like to take care of - when a parent thread creates new subthreads, I want them to inherit the state of the parent thread. I was thinking of accomplishing this by replacing Thread.__init__(), but using local() would be incompatible with this use case in general, since I wouldn't be able to pass variables from parent to child threads.


I've also been experimenting, more successfully, with simply saving my attributes to the threads themselves:

current_thread = threading.current_thread()
setattr(current_thread, my_reference, new_value)

The problem with this is that, for a reason which completely mystifies me, any other variable in the module's namespace whose value is currently current_thread.my_reference also gets set to new_value. I have no idea why, and I've been unable to replicate the problem in a MVE (though it happens consistently in my IDE, even after restarting it). As my other currently-active question implies, the objects I'm setting here are references to output streams (every reference to an instance of the intermediary IO streaming I described in that answer is getting replaced by the file descriptor with which this method is being called), if that has anything to do with it, but I can't imagine why the type of object would affect how references work in this case.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • According to the link below, threading.get_ident() does not return a real ID. You can get an OS ID using ctypes. http://blog.devork.be/2010/09/finding-linux-thread-id-from-within.html – postoronnim Jul 09 '19 at 20:22
  • @postoronnim That's helpful to know, thanks, but I don't think it solves the problem. OS thread IDs can be repeated as well, so the problem still stands. I think I'm using `threading.get_ident()` for its intended purpose - "as a magic cookie to be used e.g. to index a dictionary of thread-specific data." The OS does make an effort to not duplicate thread IDs too quickly, so would using a combination of that and the "check periodically" strategy be ideal here? – Green Cloak Guy Jul 09 '19 at 20:30
  • It sounds like the "check periodically" one might get you into even more trouble. I was leaning towards finding a way of making a unique thread identifier. Maybe by combining threading.get_ident() and stack pointers into an integer? https://stackoverflow.com/questions/34164854/threads-memory-layout – postoronnim Jul 09 '19 at 20:43
  • Can you post the threading approach you are using? Using classes or functions or fork? It may help suggest an approach specific to your design – Tarun Lalwani Jul 15 '19 at 14:23
  • @TarunLalwani My code doesn't use multithreading itself, but it's assuming that the code that calls it will, and it wants to behave uniquely for each thread. I'm expecting this to be via the `threading` module (`multiprocessing` is much easier to deal with because of non-shared memory, and `fork`/`subprocess` would be outside the existing program's sphere of influence anyway) – Green Cloak Guy Jul 15 '19 at 14:28
  • Then why not use `threading.local()` and store the flag inside that to toggle between A and B? – Tarun Lalwani Jul 15 '19 at 14:40
  • Please provide a minimal git repo so we can understand the context better, as of now its not 100% clear to me. Looking at sample code, might be able to work out something for you – Tarun Lalwani Jul 16 '19 at 02:38
  • @TarunLalwani I've edited my post to account for why `threading.local()` doesn't really work for my use case (and, for that matter, added information about my use case) as well as other things I've tried so far and the extremely weird problems that those attempted solutions are having. Hopefully my reference to my other question provides enough contextual information. – Green Cloak Guy Jul 16 '19 at 04:19
  • Do you have the possibility to pass arguments from parent to child process, or to execute extra startup code in the child process that could query the state of the parent process? – Roland Weber Jul 16 '19 at 19:47
  • If you want to go with the table approach, can't you just let newly started threads clear their table entry? That would get rid of old data from a previous use of the same ID. – Roland Weber Jul 16 '19 at 19:54
  • @RolandWeber *Process* transition is no issue, I'd assume, because memory is copied. *Thread* transition is what I need to support, and that is handled by the user completely independently of my module. I can mix and match methods in the thread's `__init__` method, if I replace it, but can't do anything there that I would be able to do otherwise. Good thought with the table approach and clearing it upon starting, though - I hadn't thought of that. I'll try it, see if it works. – Green Cloak Guy Jul 16 '19 at 19:55
  • Argh... Unix terminology. Sorry, I meant thread when I wrote process. – Roland Weber Jul 20 '19 at 12:29

1 Answers1

4

My answer is a very simple answer to your question, hence I wonder if I missed something. Basically, I think you should avoid to store the current state of extrnal objects in your module.

You need to store the state (if change_behavior was called and maybe some other data) somewhere. You have two main options: store the state in the module or store the state in the thread itself. Aside from the issues you had in storing the state in the module, one expects a module to be (mainly) stateless, hence I think you should stick to the latter and store data in the thread.

Version 1

If you store the state in a field, you have a little risk of collision between the name of the attribute you create and the names of existing attributes, but if the documentation is clear and if you choose a good name, that should not be an issue.

A simple proof of concept, without setattr or hasattr (I didn't check the source code of CPython but maybe the weird behavior comes from setattr):

module1.py

import threading
import random
import time

_lock = threading.Lock()

def do_something():
    with _lock:
        t = threading.current_thread()
        try:
            if t._my_module_s:
                print(f"DoB ({t})")
            else:
                print(f"DoA ({t})")
        except AttributeError:
            t._my_module_s = 0
            print(f"DoA ({t})")

    time.sleep(random.random()*2)

def change_behavior():
    with _lock:
        t = threading.current_thread()
        print(f"Change behavior of: {t}")
        t._my_module_s = 1

test1.py

import random
import threading
from module1 import *

class MyThread(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)

    def run(self):
        n = random.randint(1, 10)
        for i in range(n):
            do_something()
        change_behavior()
        for i in range(10-n):
            do_something()

thread_1 = MyThread()
thread_2 = MyThread()
thread_1.start()
thread_2.start()
thread_1.join()
thread_2.join()

Output 1

DoA (<MyThread(Thread-1, started 140155115792128)>)
DoA (<MyThread(Thread-2, started 140155107399424)>)
DoA (<MyThread(Thread-1, started 140155115792128)>)
DoA (<MyThread(Thread-1, started 140155115792128)>)
Change behavior of: <MyThread(Thread-1, started 140155115792128)>
DoB (<MyThread(Thread-1, started 140155115792128)>)
DoB (<MyThread(Thread-1, started 140155115792128)>)
DoA (<MyThread(Thread-2, started 140155107399424)>)
DoB (<MyThread(Thread-1, started 140155115792128)>)
DoA (<MyThread(Thread-2, started 140155107399424)>)
DoB (<MyThread(Thread-1, started 140155115792128)>)
DoA (<MyThread(Thread-2, started 140155107399424)>)
DoA (<MyThread(Thread-2, started 140155107399424)>)
DoB (<MyThread(Thread-1, started 140155115792128)>)
DoA (<MyThread(Thread-2, started 140155107399424)>)
Change behavior of: <MyThread(Thread-2, started 140155107399424)>
DoB (<MyThread(Thread-2, started 140155107399424)>)
DoB (<MyThread(Thread-1, started 140155115792128)>)
DoB (<MyThread(Thread-1, started 140155115792128)>)
DoB (<MyThread(Thread-2, started 140155107399424)>)
DoB (<MyThread(Thread-2, started 140155107399424)>)
DoB (<MyThread(Thread-2, started 140155107399424)>)

Version 2

If you are sure that the end user will use your module inside threads, you can provide him/her a convenient way to do that. The idea is to handle the threads yourself. Just wrap the user function in a thread, and store the state of the thread in this thread as above. The difference is that you are the owner of the Thread child class and you avoid the risk of name collision. Plus, the code becomes, in my opinion, cleaner:

module2.py

import threading
import random
import time

_lock = threading.Lock()

def do_something():
    with _lock:
        t = threading.current_thread()
        t.do_something() # t must be a _UserFunctionWrapper
    time.sleep(random.random()*2)

def change_behavior():
    with _lock:
        t = threading.current_thread()
        t.change_behavior() # t must be a _UserFunctionWrapper

def wrap_in_thread(f):
    return _UserFunctionWrapper(f)

class _UserFunctionWrapper(threading.Thread):
    def __init__(self, user_function):
        threading.Thread.__init__(self)
        self._user_function = user_function
        self._s = 0

    def change_behavior(self):
        print(f"Change behavior of: {self}")
        self._s = 1

    def do_something(self):
        if self._s:
            print(f"DoB ({self})")
        else:
            print(f"DoA ({self})")

    def run(self):
        self._user_function()

test2.py

import random
from module2 import *

def user_function():
    n = random.randint(1, 10)
    for i in range(n):
        do_something() # won't work if the function is not wrapped
    change_behavior()
    for i in range(10-n):
        do_something()

thread_1 = wrap_in_thread(user_function)
thread_2 = wrap_in_thread(user_function)
thread_1.start()
thread_2.start()
thread_1.join()
thread_2.join()

Output 2

DoA (<_UserFunctionWrapper(Thread-1, started 140193896072960)>)
DoA (<_UserFunctionWrapper(Thread-2, started 140193887680256)>)
DoA (<_UserFunctionWrapper(Thread-2, started 140193887680256)>)
Change behavior of: <_UserFunctionWrapper(Thread-1, started 140193896072960)>
DoB (<_UserFunctionWrapper(Thread-1, started 140193896072960)>)
DoB (<_UserFunctionWrapper(Thread-1, started 140193896072960)>)
DoA (<_UserFunctionWrapper(Thread-2, started 140193887680256)>)
DoA (<_UserFunctionWrapper(Thread-2, started 140193887680256)>)
DoB (<_UserFunctionWrapper(Thread-1, started 140193896072960)>)
DoB (<_UserFunctionWrapper(Thread-1, started 140193896072960)>)
DoA (<_UserFunctionWrapper(Thread-2, started 140193887680256)>)
DoB (<_UserFunctionWrapper(Thread-1, started 140193896072960)>)
DoA (<_UserFunctionWrapper(Thread-2, started 140193887680256)>)
DoB (<_UserFunctionWrapper(Thread-1, started 140193896072960)>)
DoA (<_UserFunctionWrapper(Thread-2, started 140193887680256)>)
DoB (<_UserFunctionWrapper(Thread-1, started 140193896072960)>)
DoA (<_UserFunctionWrapper(Thread-2, started 140193887680256)>)
DoA (<_UserFunctionWrapper(Thread-2, started 140193887680256)>)
Change behavior of: <_UserFunctionWrapper(Thread-2, started 140193887680256)>
DoB (<_UserFunctionWrapper(Thread-2, started 140193887680256)>)
DoB (<_UserFunctionWrapper(Thread-1, started 140193896072960)>)
DoB (<_UserFunctionWrapper(Thread-1, started 140193896072960)>)

The drawback is that you have to use a thread even if you don't need it.

jferard
  • 7,835
  • 2
  • 22
  • 35