88

I came across some code with a line similar to

x[x<2]=0

Playing around with variations, I am still stuck on what this syntax does.

Examples:

>>> x = [1,2,3,4,5]
>>> x[x<2]
1
>>> x[x<3]
1
>>> x[x>2]
2
>>> x[x<2]=0
>>> x
[0, 2, 3, 4, 5]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
aberger
  • 2,299
  • 4
  • 17
  • 29
  • 7
    it never makes sense to do this with a list. – abcd Apr 13 '16 at 16:57
  • 12
    This only makes sense with NumPy arrays or similar objects, which behave completely differently from the behavior in your experiments or the list-based behavior explained in either answer. – user2357112 Apr 13 '16 at 17:01
  • 11
    Note, this does not work in Python 3. Types are only able to be compared when the comparison makes sense. In Python 3 this example throws `TypeError: unorderable types: list() < int()`. – Morgan Thrapp Apr 13 '16 at 18:01
  • 2
    Too little information. Should've mentioned that the array is a numpy array. – lmaooooo Apr 13 '16 at 18:08
  • 3
    I'm shocked this got so many upvotes (though it is indeed a good question for SO format). – PascalVKooten Apr 13 '16 at 18:51
  • 1
    Are you sure it was a list? This would also be a valid way to access `True` or `False` keys in a dictionary. – SuperBiasedMan Apr 14 '16 at 11:56
  • 2
    @SuperBiasedMan You would access a dictionary by comparing the_same_ dictionary with a number? – MSeifert Apr 14 '16 at 19:38

5 Answers5

122

This only makes sense with NumPy arrays. The behavior with lists is useless, and specific to Python 2 (not Python 3). You may want to double-check if the original object was indeed a NumPy array (see further below) and not a list.

But in your code here, x is a simple list.

Since

x < 2

is False i.e 0, therefore

x[x<2] is x[0]

x[0] gets changed.

Conversely, x[x>2] is x[True] or x[1]

So, x[1] gets changed.

Why does this happen?

The rules for comparison are:

  1. When you order two strings or two numeric types the ordering is done in the expected way (lexicographic ordering for string, numeric ordering for integers).

  2. When you order a numeric and a non-numeric type, the numeric type comes first.

  3. When you order two incompatible types where neither is numeric, they are ordered by the alphabetical order of their typenames:

So, we have the following order

numeric < list < string < tuple

See the accepted answer for How does Python compare string and int?.

If x is a NumPy array, then the syntax makes more sense because of boolean array indexing. In that case, x < 2 isn't a boolean at all; it's an array of booleans representing whether each element of x was less than 2. x[x < 2] = 0 then selects the elements of x that were less than 2 and sets those cells to 0. See Indexing.

>>> x = np.array([1., -1., -2., 3])
>>> x < 0
array([False,  True,  True, False], dtype=bool)
>>> x[x < 0] += 20   # All elements < 0 get increased by 20
>>> x
array([  1.,  19.,  18.,   3.]) # Only elements < 0 are affected
Community
  • 1
  • 1
trans1st0r
  • 2,023
  • 2
  • 17
  • 23
  • 11
    Given that the OP specifically says "I came across some code like this...", I think your answer describing numpy boolean indexing is very useful - might be worth pointing out that if the OP scrolls up the code they looked at, they'll almost certainly see an `import` for numpy. – J Richard Snape Apr 13 '16 at 18:03
  • 2
    Still an overly clever way to do it, surely? (As compared with, say, `[0 if i < 2 else i for i in x]`.) Or is this encouraged style in Numpy? – Tim Pederick Apr 14 '16 at 19:58
  • 7
    @TimPederick: Using list comprehensions with NumPy is a pretty bad idea. It's dozens to hundreds of times slower, it doesn't work with arbitrary-dimensional arrays, it's easier to get the element types screwed up, and it creates a list instead of an array. Boolean array indexing is completely normal and expected in NumPy. – user2357112 Apr 14 '16 at 20:29
  • @TimPederick In addition to the performance hit it's also likely that whoever wrote the code intended to keep using a numpy array. `x[x<2]` will return a numpy array, whereas `[0 if i<2 else i for i in x]` returns a list. This is because `x[x<2]` is an indexing operation (referred to in numpy/scipy/pandas as a slicing operation due to the ability to mask data), whereas the list comprehension is a new object definition. See [NumPy indexing](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html) – Michael Delgado Apr 15 '16 at 00:26
45
>>> x = [1,2,3,4,5]
>>> x<2
False
>>> x[False]
1
>>> x[True]
2

The bool is simply converted to an integer. The index is either 0 or 1.

Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
14

The original code in your question works only in Python 2. If x is a list in Python 2, the comparison x < y is False if y is an integer. This is because it does not make sense to compare a list with an integer. However in Python 2, if the operands are not comparable, the comparison is based in CPython on the alphabetical ordering of the names of the types; additionally all numbers come first in mixed-type comparisons. This is not even spelled out in the documentation of CPython 2, and different Python 2 implementations could give different results. That is [1, 2, 3, 4, 5] < 2 evaluates to False because 2 is a number and thus "smaller" than a list in CPython. This mixed comparison was eventually deemed to be too obscure a feature, and was removed in Python 3.0.


Now, the result of < is a bool; and bool is a subclass of int:

>>> isinstance(False, int)
True
>>> isinstance(True, int)
True
>>> False == 0
True
>>> True == 1
True
>>> False + 5
5
>>> True + 5
6

So basically you're taking the element 0 or 1 depending on whether the comparison is true or false.


If you try the code above in Python 3, you will get TypeError: unorderable types: list() < int() due to a change in Python 3.0:

Ordering Comparisons

Python 3.0 has simplified the rules for ordering comparisons:

The ordering comparison operators (<, <=, >=, >) raise a TypeError exception when the operands don’t have a meaningful natural ordering. Thus, expressions like 1 < '', 0 > None or len <= len are no longer valid, and e.g. None < None raises TypeError instead of returning False. A corollary is that sorting a heterogeneous list no longer makes sense – all the elements must be comparable to each other. Note that this does not apply to the == and != operators: objects of different incomparable types always compare unequal to each other.


There are many datatypes that overload the comparison operators to do something different (dataframes from pandas, numpy's arrays). If the code that you were using did something else, it was because x was not a list, but an instance of some other class with operator < overridden to return a value that is not a bool; and this value was then handled specially by x[] (aka __getitem__/__setitem__)

Community
  • 1
  • 1
9

This has one more use: code golf. Code golf is the art of writing programs that solve some problem in as few source code bytes as possible.

return(a,b)[c<d]

is roughly equivalent to

if c < d:
    return b
else:
    return a

except that both a and b are evaluated in the first version, but not in the second version.

c<d evaluates to True or False.
(a, b) is a tuple.
Indexing on a tuple works like indexing on a list: (3,5)[1] == 5.
True is equal to 1 and False is equal to 0.

  1. (a,b)[c<d]
  2. (a,b)[True]
  3. (a,b)[1]
  4. b

or for False:

  1. (a,b)[c<d]
  2. (a,b)[False]
  3. (a,b)[0]
  4. a

There's a good list on the stack exchange network of many nasty things you can do to python in order to save a few bytes. https://codegolf.stackexchange.com/questions/54/tips-for-golfing-in-python

Although in normal code this should never be used, and in your case it would mean that x acts both as something that can be compared to an integer and as a container that supports slicing, which is a very unusual combination. It's probably Numpy code, as others have pointed out.

Community
  • 1
  • 1
Filip Haglund
  • 13,919
  • 13
  • 64
  • 113
6

In general it could mean anything. It was already explained what it means if x is a list or numpy.ndarray but in general it only depends on how the comparison operators (<, >, ...) and also how the get/set-item ([...]-syntax) are implemented.

x.__getitem__(x.__lt__(2))      # this is what x[x < 2] means!
x.__setitem__(x.__lt__(2), 0)   # this is what x[x < 2] = 0 means!

Because:

  • x < value is equivalent to x.__lt__(value)
  • x[value] is (roughly) equivalent to x.__getitem__(value)
  • x[value] = othervalue is (also roughly) equivalent to x.__setitem__(value, othervalue).

This can be customized to do anything you want. Just as an example (mimics a bit numpys-boolean indexing):

class Test:
    def __init__(self, value):
        self.value = value

    def __lt__(self, other):
        # You could do anything in here. For example create a new list indicating if that 
        # element is less than the other value
        res = [item < other for item in self.value]
        return self.__class__(res)

    def __repr__(self):
        return '{0} ({1})'.format(self.__class__.__name__, self.value)

    def __getitem__(self, item):
        # If you index with an instance of this class use "boolean-indexing"
        if isinstance(item, Test):
            res = self.__class__([i for i, index in zip(self.value, item) if index])
            return res
        # Something else was given just try to use it on the value
        return self.value[item]

    def __setitem__(self, item, value):
        if isinstance(item, Test):
            self.value = [i if not index else value for i, index in zip(self.value, item)]
        else:
            self.value[item] = value

So now let's see what happens if you use it:

>>> a = Test([1,2,3])
>>> a
Test ([1, 2, 3])
>>> a < 2  # calls __lt__
Test ([True, False, False])
>>> a[Test([True, False, False])] # calls __getitem__
Test ([1])
>>> a[a < 2] # or short form
Test ([1])

>>> a[a < 2] = 0  # calls __setitem__
>>> a
Test ([0, 2, 3])

Notice this is just one possibility. You are free to implement almost everything you want.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • I would say using **anything** really is way too general for logically explainable behavior like the accepted answer. – PascalVKooten Apr 14 '16 at 20:52
  • @PascalvKooten Do you disagree with the "anything" or with the generalized answer? I think it's an important point to make because most _logical behaviour_ in python is just by convention. – MSeifert Apr 14 '16 at 21:25