1

A list of unique elements can be attained by converting a list to a dictionary or a set, then back to a list.

>>> original_list = [1,2,3,4,5,6,7,8,9,2,4,6,8]
>>> original_list
[1, 2, 3, 4, 5, 6, 7, 8, 9, 2, 4, 6, 8]
>>> 
>>> unique_via_set = list(set(original_list))
>>> unique_via_dict = list(dict.fromkeys(original_list))
>>> 
>>> unique_via_set
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> unique_via_dict
[1, 2, 3, 4, 5, 6, 7, 8, 9]

^Python 3.7.3

Main Question:

What is the difference between these two strategies?

Sub Questions:

  • Is one more efficient than the other?
  • Is there a use case that is more optimized for one or the other?
  • Is there a practical difference in the run time between these two strategies?

Note: I'm not looking for the best or fastest way to get unique elements from a list, I am looking to compare the two methods above

Edit: Again, I am not looking for HOW to remove duplicates, I am merely looking for the practical differences in the methods above, if any

Eddie
  • 378
  • 1
  • 3
  • 16
  • Possibly related: https://softwareengineering.stackexchange.com/questions/112662/how-costly-are-the-python-dict-and-set-in-built-types, but fundamentally there is no difference. Dictionaries use membership testing for adding elements to it, the same as for sets so theoretically it should be constant time or O(1). – rayryeng Jun 10 '19 at 18:38
  • 2
    Well building a dictionary using the list elements as keys works, as there cannot be duplicate keys (although there is the overhead of creating a dict first), however why not just use `sets` as they are precisely intended for this purpose, i.e for collections of unique items? – yatu Jun 10 '19 at 18:38
  • You have asked so many questions. Moreover, it is more opinion based. You can check the timings to see which one is faster. I would mark it duplicate of https://stackoverflow.com/questions/7961363/removing-duplicates-in-python-lists – Sheldore Jun 10 '19 at 18:39
  • Possible duplicate of [Removing duplicates in Python lists](https://stackoverflow.com/questions/7961363/removing-duplicates-in-python-lists) – DSC Jun 10 '19 at 18:40
  • 2
    This doesn't look like a dupe to me, at least not of the linked questions - this is around performance of the (known) methods. – Peter DeGlopper Jun 10 '19 at 18:42
  • Thanks for the suggestions. I also don't believe it is a duplicate, I'm not looking for optimal strategies to remove duplicates, I am only looking to understand the functional differences between the two strategies in the question. @rayryeng Thanks for the related link. – Eddie Jun 10 '19 at 18:43
  • 3
    I think it's worth pointing out that your intention is more clear when using sets for anyone that may read your code. – busybear Jun 10 '19 at 18:53

1 Answers1

2

While the implementations of the hash tables for dictionaries and sets may differ slightly, you will not see a functional difference between the keys of the dictionary and the set. The only major difference will be that the dictionary allocates a bunch of values referencing None, which the set does not need. That might make the dictionary implementation less efficient, since it is unnecessary overhead.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264