3

Should the use of in or not in be avoided when dealing with lists/tuples of floats? Is its implementation something like the code below or is it something more sophisticated?

check = False
for item in list_to_search_the_value_in:
    if value_to_search_for == item:
        check = True
ismail
  • 46,010
  • 9
  • 86
  • 95
Ma0
  • 15,057
  • 4
  • 35
  • 65
  • You might want to have a look [here](http://stackoverflow.com/questions/2217001/override-in-operator-in-python). The `in` operator should be preferred as it can make use of any special containment test the container offers (e.g. `set.__contains__()` is a lot faster than `list.__contains__()`). The problem is with the `float` part, because comparing floats from different sources for equality is usually a numerical no-go. – dhke Jun 03 '16 at 11:36
  • Yes. Comparing floats for equality is best avoided, for the [usual reasons](http://stackoverflow.com/questions/588004/is-floating-point-math-broken?rq=1). – Håken Lid Jun 03 '16 at 11:38
  • A possible alternative is to sort the list of floats, and use binary search to find the closest match, subtract and check if the difference is less than a given limit. – Håken Lid Jun 03 '16 at 11:45
  • @dhke: so if i understand you correctly, you are saying that in general it should be used because it's implementation varies depending on the container type (faster for sets) but when the container contains floats it should be avoided. Right? – Ma0 Jun 03 '16 at 11:46
  • @Ev.Kounis Yes, but dependent on your use case. If you know the numbers inside the sequence are from the same source, there is no harm in comparing floats for equality. But if the numbers are from different sources, i.e. one from a table and the other the result of user input, numerical errors will come back to bite you. – dhke Jun 03 '16 at 11:50
  • What you are doing is 100% equivalent to `in` **except** that is going to be quite slower. So if you are asking "Should I avoid `in` and use my home-made solution which has no actual advantage in performance nor robustness instead?" Then the answer is: just us `in`. It would be different if you used `if abs(value_to_search_for - item) <= epsilon` or something along those lines. – Bakuriu Jun 03 '16 at 11:58
  • @Bakuriu: i wasn't asking that, i was just wondering if it is indeed as simple as that or if it somehow handles floats differently (like the proposed solutions utilizing tolerances and any()). – Ma0 Jun 03 '16 at 12:02

5 Answers5

3

in and not in should be your preferred way of membership testing. Both operators can make use (via __contains__()) of any optimized membership test that the container offers.

Your problem is with the float part, because in makes an equality comparison with == (optimized to check for identity, first).

In general, for floating point comparing for equality does not yield the desired results. Hence for lists of floats, you want something like

def is_in_float(item, sequence, eps=None):
    eps = eps or 2**-52
    return any((abs(item - seq_item) < eps) for seq_item in sequence)

Use with sorting and binary search to find the closest matching float at your convenience.

dhke
  • 15,008
  • 2
  • 39
  • 56
2

Here's the part of the documentation saying that in checks for equality on sequence types. So no, this should not be used for sequences of floats.

1

The in operator uses regular equality checks behind the scenes, so it has the same limitations as __eq__() when it comes to floats. Use with caution if at all.

>>> 0.3 == 0.4 - 0.1
False

>>> 0.3 in [0.4 - 0.1]
False
Håken Lid
  • 22,318
  • 9
  • 52
  • 67
1

Since in operator uses equality check, it'll frequently fail, since floating point math is "broken" (well, it's not, but you get a point).

You may easily achieve similar functionality by using any:

epsilon = 1e-9

check = any(abs(f - value_to_search_for) < epsilon for f in seq)
# or
check = False
if any(abs(f - value_to_search_for) < epsilon for f in seq):
    check = True
Community
  • 1
  • 1
Łukasz Rogalski
  • 22,092
  • 8
  • 59
  • 93
1

Python's list type has its __contains__ method implemented in C:

static int
list_contains(PyListObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

A literal translation to Python might be:

def list_contains(a, el):
    cmp = False
    for i in range(len(a)):
        if cmp: break 
        cmp = a[i] == el
    return cmp

Your example is a more idiomatic translation.

In any case, as the other answers have noted, it uses equality to test the list items against the element you're checking for membership. With float values, that can be perilous, as numbers we'd expect to be equal may not be due to floating point rounding.

A more float-safe way of implementing the check yourself might be:

any(abs(x - el) < epsilon for x in a)

where epsilon is some small value. How small it needs to be will depend on the size of the numbers you're dealing with, and how precise you care to be. If you can estimate the amount of numeric error that might differentiate el an equivalent value in the list, you can set epsilon to one order of magnitude larger and be confident that you'll not give a false negative (and probably only give false positives in cases that are impossible to get right).

Blckknght
  • 100,903
  • 11
  • 120
  • 169