python set unexpected behavior

Question

I am trying to fix a test that works fine on AWS but fails on GCP.

For some reason, GCP change the order of the result of a query, so I decided to compare sets.

The list contains only 2 items, as you can see, they are the same, but when comparing lists, it says they are not.

received_devices
Out[49]: [(1L, u'1', None, u'test_device_1'), (2L, u'2', None, u'test_device_2')]
expected_devices
Out[50]: [(2, '2', None, 'test_device_2'), (1, '1', None, 'test_device_1')]
received_devices[0] == expected_devices[1]
Out[51]: True
received_devices[1] == expected_devices[0]
Out[52]: True
set(received_devices) == set(expected_devices)
Out[53]: False
{(1L, u'1', None, u'test_device_1'), (2L, u'2', None, u'test_device_2')} == {(2, '2', None, 'test_device_2'), (1, '1', None, 'test_device_1')}
Out[57]: True
[expected_devices[0], expected_devices[1]] == [received_devices[1], received_devices[0]]
Out[60]: True

Why is that happening?

@MorZamir Looks like there is some issue with Python versions. https://stackoverflow.com/a/47248457/4626254 — Underoos, Dec 24 '19 at 09:24
@SukumarRdjf it's weird since just comparing item to item, it says it's equal. — Zusman, Dec 24 '19 at 09:27
@MorZamir it is working fine for me `>>> set(received_devices) == set(expected_devices)` return `True` — Dishin H Goyani, Dec 24 '19 at 09:29

score 1 · Answer 1 · answered Dec 24 '19 at 09:46

Your main issue or confusion is related to the behavior of set comparisons. Sets are conceptually unordered, so when two sets are compared, Python ignores differences that only relate to the order the data is stored in (which might differ, depending on how exactly the sets were created).

print({1, 2, 3, 4} == {3, 4, 2, 1}) # prints True because the sets contain the same values

This is very different than lists, which compare lexicographically, taking the order of their contents into account.

print([1, 2, 3, 4] == [3, 4, 2, 1]) # prints False because order matters to lists

There's a secondary issue in your code that isn't actually causing a problem here, but it sort of looks like it could, and it's a bad thing that might cause other issues later on. That's the fact that you're mixing Unicode and byte strings. In Python 2, which you appear to be using, that's sort of tolerated, and a Unicode string that contains only ASCII characters will compare equal to a bytestring with the same characters in it. And importantly for using them in sets (or as dictionary keys), Unicode strings with only ASCII hash to the same values as the equivalent bytestrings.

But as soon as your data includes strings that are not using only ASCII, any code that takes advantage of this sort of thing is very likely to break. And comparisons between Unicode strings and Bytestrings don't work at all in Python 3, which you should probably be trying to switch too, since Python 2 is reaching its End of Life at the end of this year! So I'd strongly recommend you change your code to ensure you're always comparing Unicode strings to other Unicode strings, even if you need to write them as u'1' or decode them from a bytestring in a known encoding.

Another bit of noise in the code in your question is the L indicating some of the numbers are using the long type instead of int. Unlike the Unicode vs byte string issue above, this one is only a visual distraction, not an issue at all, as most Python operators and other code will transparently convert between the two types if necessary (this includes sets). In Python 3, all integers are longs, and the special notation is gone.

python set unexpected behavior

1 Answers1