Is it a good practice to use set function to list all the unique elements or are there any better approaches which have low time and space complexity.
Asked
Active
Viewed 554 times
1 Answers
1
Simply calling set
is the best way to find the unique elements if:
- the items are hashable, and
- you don't require the original ordering preserved
It's already O(n), and you can't improve on that asymptotically. If you require the ordering preserved:
from collections import OrderedDict
list(OrderedDict.fromkeys(seq))
That's still O(n) asymptotically, but will generally slower due to more overhead (Python loop vs C loop). If you have to deal with unhashable elements, you may need O(n^2) using a list:
unique = []
for item in seq:
if item not in unique:
unique.append(item)

wim
- 338,267
- 99
- 616
- 750
-
It is worth pointing out that, since python 3.6, built-in dictionaries are ordered: https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6. It is debatable whether you want to take advantage of this "implementation detail." – jpp Jan 24 '18 at 01:10
-
If we are gonna bring up `collections` for `OrderedDict`, you might as well also recommend `collections.deque` as well since it gives significant speedups. – codykochmann Jan 24 '18 at 01:14
-
1@codykochmann Could you describe how you think `deque` can help here? I don't follow. – wim Jan 24 '18 at 01:26
-
@wim See https://stackoverflow.com/a/47493474/9209546 – jpp Jan 24 '18 at 02:09
-
Still don't follow. Can you demonstrate how to get a significant speedup in de-duplication? – wim Jan 24 '18 at 02:29
-
From my experience, if youre appending to a list, a deque will give you a drop in replacement for lists that will speed up the whole process. Once you push lists to scale, they reeeeealy slow down for appending and popping which hints that lists recreate their entire structure every time you append something. A co-worker of mine had an app with a primary list that built up to be several gigs and appends (even of small objects) at that point were taking sever minutes to do at a time. – codykochmann Jan 24 '18 at 03:40
-
The second we had him switch to deques, the insane slowdown disappeared – codykochmann Jan 24 '18 at 03:42
-
Hmm, I'm not convinced. Lists definitely do not recreate the entire structure on every append, in fact, `list.append` is O(1) average case and O(1) ammortized worst case. There must have been something else broken in your co-worker's app, or some other detail that you're missing here. – wim Jan 24 '18 at 04:03
-
FWIW Just benchmarked using 1000, 10_000, and 100_000 items. Could not find any difference in performance between lists and deque (list was slightly faster, but not by anything significant). – wim Jan 24 '18 at 04:21
-
Thanks for the answer. – ganesh Jan 24 '18 at 10:35