9

Python 3.8 (or CPython 3.8?) added the warning

SyntaxWarning: "is" with a literal. Did you mean "=="?

for the code 0 is 0.

I understand the warning, and I know the difference between is and ==.

However, I also know that CPython caches the object for small integers and shares it in other cases as well. (Out of curiosity, I just checked the code (header) again. Small ints are cached in tstate->interp->small_ints. 0 and 1 are even more special and are stored globally in _PyLong_Zero and _PyLong_One. All new creations of ints are via PyLong_FromLong and that one first checks if it is a small integer and cached.)

Given this background, if you know you have an int object, you could say that the check x is 0 should be safe, right? Also, you could derive that 0 is 0 should always be True, right? Or is this an implementation detail of CPython and other interpreters do not follow this? Which interpreter does not follow this?

Despite this more generic question (which I'm just curious about), consider this more specific (example) code:

def sum1a(*args):
    y = 0
    for x in args:
        if y is 0:
            y = x
        else:
            y = y + x
    return y

Vs:

def sum1b(*args):
    y = 0
    for x in args:
        if y == 0:
            y = x
        else:
            y = y + x
    return y

Vs:

def sum1c(*args):
    y = None
    for x in args:
        if y is None:
            y = x
        else:
            y = y + x
    if y is None:
        return 0
    return y

Vs:

def sum2(*args):
    y = 0
    for x in args:
        y = y + x
    return y

The reason I would sometimes prefer sum1* over sum2 is that depending on the library, sum1* can really be more efficient. E.g. if the argument is a Numpy/TensorFlow/PyTorch array, you really would save a (potentially costly) operation here.

The reason I would prefer sum1a over sum1b is that sum1b would break on certain inputs. E.g. if the input is a Numpy array, this would not work.

Of course, you could use sum1c instead of sum1a. However, sum1a is shorter. So this is nicer?

If the answer to the original question is that this should always work, and if you agree that sum1a is the best option then, how would you get rid of the warning? Is there a simple workaround? In general, I can see that the warning can be useful. So I would not want to disable it completely. I just want to disable it for this specific statement.

Maybe I could wrap it up in a function:

def is_(a, b):
    return a is b

And then just use if is_(y, 0): .... Does this work? Is that a good idea?

Albert
  • 65,406
  • 61
  • 242
  • 386
  • 1
    Have you considered using either an `or` clause (e.g. `if y is None or y == 0:`) or just checking for falsiness (e.g. `if not y:`)? I suspect the reason the warning is there would be that cached integers is treated as implementation-specific behavior that shouldn't be relied upon. – Green Cloak Guy Jun 03 '20 at 15:01
  • 1
    `y == 0` does not work in general (e.g. for Numpy/TensorFlow arrays). Neither does `not y`. – Albert Jun 03 '20 at 15:02
  • 1
    Caching of small ints is *not guaranteed* - it's a compile-time option that can be disabled. However, I would expect `0 is 0` (that code literally, not zeros from arbitrary sources) to always be true, since both zeros were part of the same compiled code object and therefore would always be combined into a single constant. – jasonharper Jun 03 '20 at 15:04
  • @jasonharper But then the code of `sum1a` is also always correct, right? – Albert Jun 03 '20 at 15:05
  • hhm, imho `if is_(y, 0)` is less readable than `if y is 0` so why would you do that? – garglblarg Jun 03 '20 at 15:07
  • @garglblarg To avoid the warning. How do you avoid the warning otherwise? – Albert Jun 03 '20 at 15:08
  • No, `sum1a` is not safe in the absence of caching - you could end up with a zero in `y` that's not the same as the constant `0` in the function. This would happen if one of the `x`s from the input was the negative of the sum so far. – jasonharper Jun 03 '20 at 15:11
  • 1
    @jasonharper Constant folding is also implementation defined. True that any reasonable python implementation would combine this in a single constant but I can write an implementation that will create a new `0` for each occurrence found and it wouldn't be wrong, just silly. – Dimitris Fasarakis Hilliard Jun 03 '20 at 15:11
  • 1
    @Albert You can [suppress warnings](https://docs.python.org/3/library/warnings.html#describing-warning-filters) with the [`-W`](https://docs.python.org/3/using/cmdline.html#cmdoption-w) command line option. For your example: `python -Wignore::SyntaxWarning main.py`. An alternative is to configure the [`warnings` module](https://docs.python.org/3/library/warnings.html). Regarding `is_` there is already [`operator.is_`](https://docs.python.org/3/library/operator.html#operator.is_) defined. – a_guest Jun 03 '20 at 15:17
  • @a_guest But as I said, I don't want to globally disable the warning, just for this statement. – Albert Jun 03 '20 at 15:23
  • @Albert Regarding the question, as others have pointed out, this is an implementation detail of CPython (and perhaps other implementations). Even CPython could change this without a deprecation schedule or notifying its users (though this is unlikely to happen). So relying on `is 0` is really a micro-optimization in terms of code length and the question is if it's really worth it? `sum1c` looks fine and it'll always work. Using `is 0` you should ideally watch the issue tracker very carefully for every release that you upgrade because it could change that implementation detail. – a_guest Jun 03 '20 at 15:24
  • @Albert The warnings filter lets you also specify a regex to match the warning's message, the module name and even the line number. So you can have as fine-grained control as you like. Keeping the line number in sync might be tedious but you can always put this in an isolated module and just specify the module name. In the end, there is a price to pay for relying on implementation details. – a_guest Jun 03 '20 at 15:27
  • @Albert By the way, speaking about optimizing costly operations, your `sum*` functions should use `y += x` instead of `y = y + x` to prevent creation of an additional object (perhaps with `x.copy()` on the first iteration; though I think Numpy actually uses some ref. count hacking to optimize this under certain circumstances). – a_guest Jun 03 '20 at 15:34
  • Related: [How to create the int 1 at two different memory locations?](https://stackoverflow.com/q/21456318/674039) – wim Jun 03 '20 at 15:38
  • 3
    The post I linked above shows that the CPython small integer cache can also be defeated in some cases. Don't rely on implementation detail, ever! – wim Jun 03 '20 at 15:45

1 Answers1

9

No, it isn't. Case in point the Rust implementation for Python returns False:

>>>>> 0 is 0
False

and this is not wrong, though I expect this to change in future versions (it has!).

is calls id who's only stipulation is that the id returned is unique and constant for a given object. Whether the source code representation for a number (0 here) maps to a distinct object or not is up for the implementation to define.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253