6

Why would the implementers choose to make sys.path into a list as opposed to a ordered set?

Having sys.path as a list gives rise to the possibility of having multiple duplicates in the path, slowing down the search time for modules.

An artificial example would be the following silly example

# instant importing
import os
import sys

for i in xrange(50000):
    sys.path.insert(0, os.path.abspath(".")

# importing takes a while to fail
import hello

To summarise from the comments and answers given:

It seems from the responses below that a list is a simple structure which handles 99% of everyone's needs, it does not come with a safety feature of avoiding duplicates however it does come with a primitive prioritisation which is the index of the element in the list where you can easily set the highest priority by prepending or lowest priority by appending.

Adding a richer prioritisation i.e. insert before this element would be rarely used as the interface to this would be too much effort for a simple task. As the accepted answer states, there is no practical need for anything more advanced covering these extra use cases as historically people are used to this.

Har
  • 3,727
  • 10
  • 41
  • 75
  • 10
    but the order matters... (`set`s are unordered). – hiro protagonist Apr 25 '17 at 15:44
  • Not sure I follow ... why would having it as a list create duplicates?? – grail Apr 25 '17 at 15:45
  • 4
    Sure, it's possible to have duplicates in the list, but that would be your fault for not checking prior to adding to it. Plus, it's a pretty minor problem. Consider *nix hasn't found the need to remove duplicates from `PATH` for fifty years, either. Keeping it simple keeps it fast. – pbuck Apr 25 '17 at 15:52
  • 1
    @hiroprotagonist what about if the set was ordered? – Har Apr 25 '17 at 16:13
  • @pbuck I guess insertion would be slow but reading would be equally as fast (maybe slightly less memory efficient). Most programs read the path variable more than they write to it (im assuming) – Har Apr 25 '17 at 16:14
  • 1
    When was the last time a slowdown in your program was due to having duplicates in `sys.path`? – jwodder Apr 25 '17 at 16:20
  • The lesson is : When you go out of your way to break Python internals, Python gets broken. :) [We're all consenting adults here.](https://mail.python.org/pipermail/tutor/2003-October/025932.html) – Eric Duminil Apr 25 '17 at 16:58
  • @Har : python sets are unordered! the fact that they seem ordered from python 3.6 on is just an implementation detail and nothing you should rely on. oh, you mean if there were such a thing as an ordered set (that you can easily create)? – hiro protagonist Apr 25 '17 at 17:22
  • 1
    oh, ordered dictionary keys and sets may become guaranteed... https://twitter.com/raymondh/status/850102884972675072 – hiro protagonist Apr 25 '17 at 18:40
  • @hiroprotagonist yes I do mean if there were such a thing – Har Apr 26 '17 at 09:28

2 Answers2

2

sys.path specifies a search path. Typically search paths are ordered with the order of the items indicating search order. If sys.path was a set then there would be no explicit ordering making sys.path less useful. It's also worth considering that optimization is a tricky issue. A reasonable optimization to address any performance concerns would be to simply keep a record of already searched elements of sys.path. Trying to be tricky with ordered sets probably isn't worth the effort.

John Percival Hackworth
  • 11,395
  • 2
  • 29
  • 38
  • That is a good point, I have edited my post to reflect on the order – Har Apr 25 '17 at 16:15
  • @JohnPercivalHackworth Why is the order important? I would be worried if I imported a module and two of the same name were found in the folders listed in `sys.path`. I feel like needing the order to stay is a symptom of bad organization in the project. Do you have counter-examples or explanations by chance? – Guimoute Sep 01 '21 at 13:52
2
  • Ordered set is
  • There's no practical need for the added complexity
    • List is a very simple structure, while ordered set is basically a hash table + list + weaving logic
    • You don't need to do operations with sys.path that a set is designed for - check if the exact path is in sys.path - even less so, do it very quickly
    • On the contrary, sys.path's typical use cases are those exactly for a list: trying elements in sequence, prepending or appending one

To summarize, there's both a historical reason and a lack of any practical need.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • 1
    After thinking about it as well, one addition to this, is that a list is the raw-est form of the data i..e it reflects on what the user has done, so that the developer is able to make his own decisions later on as to what to do with it. – Har Sep 04 '17 at 09:49