16

I was working on a project for class where my code wasn't producing the same results as the reference code.

I compared my code with the reference code line by line, they appeared almost exactly the same. Everything seemed to be logically equivalent. Eventually I began replacing lines and testing until I found the line that mattered.

Turned out it was something like this (EDIT: exact code is lower down):

# my version:
max_q = max([x for x in self.getQValues(state)])

# reference version which worked:
max_q = max(x for x in self.getQValues(state))

Now, this baffled me. I tried some experiments with the Python (2.7) interpreter, running tests using max on list comprehensions with and without the square brackets. Results seemed to be exactly the same.

Even by debugging via PyCharm I could find no reason why my version didn't produce the exact same result as the reference version. Up to this point I thought I had a pretty good handle on how list comprehensions worked (and how the max() function worked), but now I'm not so sure, because this is such a weird discrepancy.

What's going on here? Why does my code produce different results than the reference code (in 2.7)? How does passing in a comprehension without brackets differ from passing in a comprehension with brackets?

EDIT 2: the exact code was this:

# works
max_q = max(self.getQValue(nextState, action) for action in legal_actions)

# doesn't work (i.e., provides different results)
max_q = max([self.getQValue(nextState, action) for action in legal_actions])

I don't think this should be marked as duplicate -- yes, the other question regards the difference between comprehension objects and list objects, but not why max() would provide different results when given a 'some list built by X comprehension', rather than 'X comprehension' alone.

bob
  • 1,879
  • 2
  • 15
  • 27
  • Possible duplicate of [Use \`for\` in \`print()\` will give a generator on Python 3.x?](http://stackoverflow.com/questions/33051704/use-for-in-print-will-give-a-generator-on-python-3-x) – Remi Guan Oct 25 '15 at 04:23
  • 1
    I don't think it's a dupe -- the question is why they're yielding a different max value, as far as I'm reading the question. OP -- you should try logging the `x` iterations and the final `max_q` ... you can log `x` iterations by adding a method `def debug(x): print x; return x` (in pythonic indentation) and changing the comparison lines to `max([debug(x) for x in self.getQValues(state)])` and `max(debug(x) for x in self.getQValues(state))` respectively. – DreadPirateShawn Oct 25 '15 at 04:26
  • @DreadPirateShawn I think that OP's question is also because without the `[]`, `x for x in self.getQValues(state)` is in `()`. So it will create a generator here. And `max()` can be used on generator. BTW, please @ me then I can receive a notify :) – Remi Guan Oct 25 '15 at 04:32
  • @KevinGuan -- fair enough. :-) – DreadPirateShawn Oct 25 '15 at 04:34
  • Does `self.getQValues` yield? Trying to construct a corner case like `list([((yield x), (yield -x)) for x in range(10)])`, which varies if you remove the braces – Eric Oct 25 '15 at 06:06
  • It may not be a duplicate, but given the info you provide that's the only sane judgement. However, I'd rather call this off-topic because it lacks a minimal example. – Ulrich Eckhardt Oct 25 '15 at 06:21

3 Answers3

18

Are you leaking a local variable which is affecting later code?

# works
action = 'something important'
max_q = max(self.getQValue(nextState, action) for action in legal_actions)
assert action == 'something important'

# doesn't work (i.e., provides different results)
max_q = max([self.getQValue(nextState, action) for action in legal_actions])
assert action == 'something important'  # fails!

Generator and dictionary comprehensions create a new scope, but before py3, list comprehensions do not, for backwards compatibility

Easy way to test - change your code to:

max_q = max([self.getQValue(nextState, action) for action in legal_actions])
max_q = max(self.getQValue(nextState, action) for action in legal_actions)

Assuming self.getQValue is pure, then the only lasting side effect of the first line will be to mess with local variables. If this breaks it, then that's the cause of your problem.

Eric
  • 95,302
  • 53
  • 242
  • 374
  • Wow, you were absolutely right! I'm incredibly impressed with your intuition there. This is just a little class project, but if this were actually important this little bug could have stumped me for hours. I've had similar problems where scopes accidentally reuse old variables, but none were an enigma as strange to me as this one. I've got some big interviews coming up, if they ask me about bugs that stumped me, I know what I'll talk about! – bob Oct 25 '15 at 06:24
  • 4
    @andy Note: this has to do with the fact that list-comprehensions in python2.x are syntactic sugar for a loop while generator expressions create a new scope (and thus don't leak variables). In python3 the two examples you have provided *do* act in the same way. – Bakuriu Oct 25 '15 at 06:31
  • @andy: I've been bitten by [a different side of this before](http://stackoverflow.com/q/12259251/102441). Sometimes, introducing the variable in an outer scope causes the `[...]` version to NameError! – Eric Oct 25 '15 at 20:51
7

Use of the [] around a list comprehension actually generates a list into your variable, or in this case into your max function. Without the brackets you are creating a generator object that will be fed into the max function.

results1 = (x for x in range(10))
results2 = [x for x in range(10)]
result3 = max(x for x in range(10))
result4 = max([x for x in range(10)])
print(type(results1)) # <class 'generator'>
print(type(results2)) # <class 'list'>
print(result3) # 9
print(result4) # 9

As far as I know, they should work essentially the same within the max function.

Jared Mackey
  • 3,998
  • 4
  • 31
  • 50
  • Worth noting http://legacy.python.org/dev/peps/pep-0289/ which confirms that `max` simply works better with the generator form. That being said, in both cases, the `self.getQValues(state)` is evaluated before the `max` occurs -- see "Early Binding versus Late Binding" in pep doc, and ipython tests confirm that the act of creating the generator itself calls the `self.getQValues(state)` method. So yeah, the generator and array form are both iterating through the same QValues array, and max should yield the same result in both. – DreadPirateShawn Oct 25 '15 at 04:23
  • That is an interesting addition to the post. Thanks – Jared Mackey Oct 25 '15 at 04:24
  • appreciate the info. I'm still baffled about what was giving different results. Maybe I'll go back and get the exact code from context, perhaps there's some other information that's important (though I know for a fact removing the brackets was the only change needed to make it work) – bob Oct 25 '15 at 04:24
  • With the edits to the OP, the pep-0289 insight changes. Since `self.getQValue(nextState, action)` is not in the outermost for-expression, it's being evaluated lazily in the generator example, and building the array first in the array example. The debug advice in a different comment still applies -- eg, what raw values are being calculated in the array vs generator? are both yielding the same number of elements? It's unlikely that the `max` method is the problem, so the best info at this point would be knowing what comprises the list vs the generator iterations. – DreadPirateShawn Oct 25 '15 at 04:37
2

I don't know why you got different values in your project, but I can give you live example, when it happens. Generator is more effective then list, so we will have different memory usage. I'm using Python 3.

Here's function that returns current memory usage by Python:

import os
import psutil


def memory_usage():
    """Get process virtual memory (vms) usage in MB."""
    process = psutil.Process(os.getpid())
    memory = process.memory_info()[1] / (1024.0 * 1024.0)
    return memory

Try this code:

# Generator version:
max_q = max(memory_usage() for i in range(100000))
print(max_q)  # 7.03125

I tested code several times and I'm getting somethig over 7 on my machine.

Replace generator version with list version:

# List version:
max_q = max([memory_usage() for i in range(100000)])
print(max_q)  # 11.44921875

I'm getting something over 11 on my machine.

As you see code is almost same, but you will get different output.

May be in your project getQValue() gives you different values based on some already calculated. But that existing values can be removed by garbage collector faster if you use generator.

Mikhail Gerasimov
  • 36,989
  • 16
  • 116
  • 159