1

I tried to remove duplicates from a list in Python 3 by converting it into a set by using set(). However I tried to achieve a certain order at the end of the process. After converting the list, I noticed, that the resulting set was not in the order, I would have expected.

data = [3, 6, 3, 4, 4, 3]
my_set = set(data)
print(my_set)

The resulting set is: (3,4,6)

I expected set() to kind of iterate over the given list from 0 to n, keeping the first instance of every integer it encounters. However the resulting set seems to be ordered in a different way.

I was unable to find anything about this in the python documentation, or here on stack overflow. Is it known how the set() method orders the elements in the given datastructure when converting it to a set?

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
MrTony
  • 264
  • 1
  • 12
  • 2
    `set` objects are inherently unordered. How they keep things is an implementation detail, but they are hash-sets if you are curious. NOTE: in the future, please tag **all** python related questions with the generic [python] tag, and only use a version-specific tag at your discretion. Python *is* Python 3, by the way... – juanpa.arrivillaga Jul 29 '19 at 22:07
  • 2
    [from the docs](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset): *A set object is an unordered collection of distinct hashable objects*. AND: *Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior.* – Tomerikoo Jul 29 '19 at 22:12
  • If you are using Python 3.7, simply done by: `list(dict.fromkeys(data))` because dictionaries are ordered – Tomerikoo Jul 29 '19 at 22:25

2 Answers2

2

The concept of order simply does not exist for sets in Python, which is why you can not expect the elements to be shown in any particular order. Here is an example of creating a list without duplicates, that has the same order as the original list.

data = [3, 6, 3, 4, 4, 3]
without_duplicates = list(dict.fromkeys(data))
>>> without_duplicates
[3, 6, 4]
ers36
  • 231
  • 2
  • 6
  • 1
    The question is not expecting the result to be sorted – donkopotamus Jul 29 '19 at 22:18
  • Sorry, I edited the answer. – ers36 Jul 29 '19 at 22:28
  • 4
    Note: This only works on CPython/PyPy 3.6 as an implementation detail. It becomes a language guarantee as of 3.7. If you're using earlier versions of Python, you'd use `collections.OrderedDict.fromkeys` instead of `dict.fromkeys` to get the same result. – ShadowRanger Jul 29 '19 at 22:29
1

set objects are not ordered by key or by insertion order in Python... you can however get what you want by building the result you are looking for explicitly:

res = []
seen = set()
for x in data:
    if x not in seen:
        seen.add(x)
        res.append(x)
print(res)
6502
  • 112,025
  • 15
  • 165
  • 265