8

Will Path('.').glob('*.ext') produce consistent ordering of results (assuming the files being globbed don't change)?

It seems the glob ordering is based on the file system order (at least, for the old glob package). Will pathlib's glob order be changed by adding files to the directory (which will not be included in the glob)? Will this order be changed by the file system even if nothing is added to the specific directory (e.g., when other large file changes are made elsewhere on the system)? Over the course of several days? Or will the ordering remain consistent in all these cases?

Just to clarify, I can't simple convert to a list and sort as there are too many file paths to fit into memory simultaneously. I'm hoping to achieve the same order each time as I will be doing some ML training, and want to set aside every nth file as validation data. This training will take several days, which is why I'm interested to know if the order remains stable over long times on the file system.

golmschenk
  • 11,736
  • 20
  • 78
  • 137
  • Its seems unlikely that it is consistent as the output in `pathlib`'s documentation appears unordered (third example https://docs.python.org/3/library/pathlib.html#basic-use) – Minion Jim Apr 02 '20 at 19:10
  • 1
    I'm not sure whether it can be relied upon. See this article about inconsistencies with ```glob``` between OS: https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/ – Paddy Harrison Apr 02 '20 at 19:12
  • During a run you could store the filename of every n-th file that was set aside, and then use that information during later runs. – a_guest Apr 02 '20 at 19:39

2 Answers2

6

Checking the source code for the pathlib module, by chance, the latest commit points us directly to the relevant place:

Use os.scandir() as context manager in Path.glob().

So under the hood Path.glob uses os.scandir to get the directory entries. The docs of this function report that the results are unordered:

Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries '.' and '..' are not included.

(emphasis mine)

a_guest
  • 34,165
  • 12
  • 64
  • 118
  • Just to note, an arbitrary order could be a consistent or inconsistent arbitrary order. That said, since the documentation does not clarify, it's probably a bad idea to rely on it being consistent. Especially since even if the current implementation is consistent, it may be an implementation detail, and the implementation may change. Since the time I originally asked this question, I stopped relying on the returned ordering of `glob` and instead store the list of paths in a separate data structure (e.g., SQLite) for my own use cases. – golmschenk Aug 31 '23 at 01:36
2

From experience glob's arbitrary order/the file system order does not change over time unless you change the files manually.

Can
  • 117
  • 3