1

I have a function call in python that looks like this:

from threading import Thread
while queue:
   Thread(target=queue.extend, args=(longfunction(a, b))).start()

So I think the code above runs several queue.extend functions in parallel i.e it doesn't wait til the previous queue.extend has returned before starting the next queue.extend. But I'm not sure about the arguments.

My question is, does Python wait until longfunction(a, b) has finished evaluating and returned before it moves on to start a new thread, or is the whole thread started at once and then the next thread is started before longfunction has returned?

I'm a bit new to threads so please explain everything.

user2108462
  • 855
  • 7
  • 23
  • As a side note, `(longfunction(a,b))` is not a tuple, it's a single value with meaningless parentheses around it. It's the commas that make a tuple, not the parentheses, so you need `(longfunction(a,b),)`. Or, if you find it more readable, just use a list instead: `[longfunction(a,b)]`. – abarnert Nov 21 '13 at 23:30

2 Answers2

3

The Thread constructor is just a normal function call; all of its arguments, including the args tuple, have to be evaluated before it can be called.

So, this is doing the longfunction(a, b) in the main thread, and only doing the queue.extend in the background thread.

The quickest way to fix this is to create a thread function with def or lambda:

Thread(target=lambda: queue.extend(longfunction(a, b)).start()

Or, alternatively:

Thread(target=(lambda x, y: queue.extend(longfunction(x, y)), args=(a, b)).start()

The difference is that the first one is a closure, capturing a and b from the local environment, so if they've changed by the time the lambda is evaluated you will get the new values, while the second isn't*, so it's getting the values of a and b at the time the args tuple is created. In most cases, this won't matter. When it does matter, you have to think through which one you want.


* Technically, closures and functions are the same thing in Python; it's just a closure with no cells in it instead of one with two cells in it.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    Does the lambda need to take `a` and `b` as arguments, or will the `lambda` create a closure that captures them? – jpmc26 Nov 21 '13 at 23:31
  • @jpmc26: The lambda creates a closure that captures them. If you want to make sure it captures the current values, rather than the cells in the local scope, then you use arguments (or the default-value parameter hack). – abarnert Nov 21 '13 at 23:32
  • (I never know what to call the things a closure captures in Python; they're not really the variables, and not just names, but the underlying closure cell objects are an implementation detail that you don't want to think about unless you're writing a Python compiler or bytecode interpreter…) – abarnert Nov 21 '13 at 23:34
  • Object references, maybe? (Sometimes known as pointers?) Unless I'm mistaken, Python does pass by value, but passes reference values much like Java. (Although, there's no distinguishing between "primitives" and "objects" like in Java; everything is an object.) – jpmc26 Nov 21 '13 at 23:35
  • @jpmc26: That's really misleading terminology. Python does what Java calls "pass by value", which is the same thing Ruby calls "pass by reference"—what it's actually passing is a reference to a value, so you can't really call either term more accurate than the other. Attempting to resolve any confusion by using one of those terms just makes the problem worse… – abarnert Nov 21 '13 at 23:38
  • Then maybe you just say it passes the object's memory address. – jpmc26 Nov 21 '13 at 23:39
  • @jpmc26: Also, "object references" seems like exactly the wrong name here. The whole point of a closure cell is that it's _not_ a reference to an value (object), it's a reference to a cell within a frame (which is sort of like a "reference to a variable" in C++ terms, except that Python doesn't have variables in that sense.) – abarnert Nov 21 '13 at 23:39
  • @jpmc26: No, it very definitely does _not_ pass the object's memory address. For one thing, most other interpreters besides CPython doesn't deal with addresses at any level. For another, passing the memory address is what CPython does in a _non_-closure case. And it's not really the passing that's the issue here; the bytecode to read a value from a closure cell is completely different from the bytecode to read a value from a parameter or other local variable, which is the only reason it works in the first place. – abarnert Nov 21 '13 at 23:42
  • 1
    I believe I've found a thorough discussion here: http://stackoverflow.com/a/2295368/1394393. Thanks for the enlightenment. – jpmc26 Nov 21 '13 at 23:44
  • @jpmc26: Thanks for the link. Unfortunately, all of the answers are carefully worded to not need a name for the thing a closure captures. :) – abarnert Nov 21 '13 at 23:45
  • I call them "captured variables". Or I might say that the lambda "closes over `a` and `b`". The important point is that it is the variable that is captured, not its value. Regarding an earlier comment, I don't find that "passing references by value" is confusing terminology: the defining characteristic of "pass by reference" should be that you can write `a = 2; func(a);` and `a != 2` afterwards. If you can't do that, it's not pass by reference, regardless of whether `a` itself is a reference to `2` (as in Python) or a memory location containing the value `2` (as `int a(2);` in C++) :-) – Steve Jessop Nov 22 '13 at 00:04
  • @SteveJessop: Which earlier comment used "passing references by value"? I don't think that's really accurate either, but at least it avoids the chance of anyone confusing it with the ambiguous terms "pass by reference" or "pass by value"; it forces them to think things through or realize that they don't understand. – abarnert Nov 22 '13 at 00:10
  • Sorry, you're right, jpmc26 actually said "passing reference values", not "passing references by value". I guess the reason I didn't find it confusing was that by the time I got to the end of this thread of comments, I had mentally re-written it ;-) Tbh I think the whole argument over terminology is straightforward when discussing a language that has both, and more or less useless when discussing a language that only has one. There's about a 30-second window learning a new language in which the phrase, "variables are references, function calls pass these references by value" is informative. – Steve Jessop Nov 22 '13 at 00:12
  • ... but the amount of time that goes into clarifying this terminology is a lot more than 30 seconds times the number of languages I will ever see in my lifetime. So although I use it, I don't feel like it's actually important. – Steve Jessop Nov 22 '13 at 00:16
  • Would it be better to define the function or to use the lambda one? In order to make the program as fast as possible. – user2108462 Nov 22 '13 at 00:58
  • @user2108462: It makes no difference. Functions are functions, whether you define them with `def` or `lambda`. See [here](http://repl.it/McJ), for example: the two versions have the exact same bytecode, and therefore will obviously run exactly as fast as each other. Besides, do you really think the time it takes to call `longfunction` will be anywhere near the time `longfunction` takes to do its work? If not, you're optimizing the wrong place. If you have some code that takes 10 seconds to run, finding a part that takes 23 microseconds and cutting it in half doesn't help at all. – abarnert Nov 22 '13 at 01:22
  • @SteveJessop: [Here's my take](http://stupidpythonideas.blogspot.com/2013/11/does-python-pass-by-value-or-by.html) on the naming issue. (For people who refuse to accept "read CTM and EOPL and then get back to me if you're still confused" as an answer.) – abarnert Nov 22 '13 at 02:33
  • Fair enough, but to me it's all predicated on you saying that Python cannot pass by value because by Python's definition only objects are values. Then in order to reference variables, Python introduces cells as an implementation detail. I don't have any investment in preserving the initial definition that only objects in Python "are" values, so to me it's not harmful to say that whatever variables actually are in Python, it passes their values to functions. Hence pass-by-value of whatever variables (and expression results) are. Which is references. – Steve Jessop Nov 22 '13 at 09:29
  • But certainly when I'm calling it pass-by-value of references, then I have to acknowledge that the "reference" I'm talking about is something that in Python is not itself an object and probably isn't what Python considers a value. But I think that's OK, because I'd only call it that in order to compare different languages with each other, so I'm always talking from slightly outside the object/value model of any particular language. – Steve Jessop Nov 22 '13 at 09:35
  • @SteveJessop: The closure thing is a minor detail, not something everything else is predicated on. You can't call it "pass by value" because it's not copying values (whether you count only objects, or objects and closure cells). You can't call it "pass by reference" because it's not referencing variables (whether you count only names, or names and closure cells). Using a third term—whether it's your "pass-by-value of references" or Liskov's "pass by object" that predates most of the languages that do things this way—doesn't help explain unless people already know the term, and they don't. – abarnert Nov 22 '13 at 19:42
0

Yes, it would wait for it to return before starting the thread. This is because when you add the () to the end of a function, it is no longer a function reference, it is a function call. Therefore, the args=(longfunction(a,b)) would be evaluated to args=(return_from_long_function,) within the main thread, then that value would be passed to the thread target, in this case: queue.extend.

IT Ninja
  • 6,174
  • 10
  • 42
  • 65
  • Without the `()` it's not a function declaration (or definition) either. Only a `def` or a `lambda` creates a function. – abarnert Nov 21 '13 at 23:29