0

I don't quite understand which function should I use to gradually iterate over files in a directory (non-recursively) without precaching them all in a one huge list. I checked the documentation of os.scandir(), but it doesn't explicitly specify whether or not it loads the entire list into memory at once.

I've also checked the documentation on .iglob(), but it was later revealed that it does store all the items in memory...

Will
  • 1,124
  • 12
  • 33
Mr.Kleiner
  • 15
  • 1
  • 5
  • 2
    Does this answer your question? [Python Iterators: What does iglob()'s Iterator provide over glob()'s list?](https://stackoverflow.com/questions/4287162/python-iterators-what-does-iglobs-iterator-provide-over-globs-list) – buran Feb 03 '23 at 14:11
  • 1
    The issue is that stuff like glob and iglob *do* actually store everything in a list – Mr.Kleiner Feb 03 '23 at 14:34
  • No, `iglob` and `os.scandir` returns iterator!!! – buran Feb 03 '23 at 14:36
  • `iglob` documentation suggests `Return an iterator which yields the same values as glob() without actually storing them all simultaneously.` Where do you see that `iglob` stores everything is a list? Perhaps some code might help us understand your issue better. – JonSG Feb 03 '23 at 14:38
  • 1
    @buran It still [stores everything in memory](https://github.com/python/cpython/blob/04e06e20ee61f3c0d1d7a827b2feb4ed41bb198d/Lib/glob.py#L177). – Will Feb 03 '23 at 14:41
  • This is for a single list, the advantages of `iglob` is where there are multiple directories being traversed. – Will Feb 03 '23 at 14:44
  • @WillDereham, you are joking, right? It's explicitly written in the docs for `iglob` - _without actually storing them all simultaneously_ – buran Feb 03 '23 at 14:46
  • Read also https://wiki.python.org/moin/Generators – buran Feb 03 '23 at 15:02
  • 2
    @buran If you actually read the [source code](https://github.com/python/cpython/blob/04e06e20ee61f3c0d1d7a827b2feb4ed41bb198d/Lib/glob.py), on [line 177](https://github.com/python/cpython/blob/04e06e20ee61f3c0d1d7a827b2feb4ed41bb198d/Lib/glob.py#L177) you will see the *generator* of filenames within a *single* directory is converted to a list, the advantage of `iglob` being when using recursive (`**`) patterns. However the question is specifically talking about listing files *non-recursively*. – Will Feb 03 '23 at 15:13
  • 2
    It looks to me like a weakness in that cpython implementation. I think it should `yield from it` rather than `return list(it)` but that is just a quick glance at the code path if `iglob()` – JonSG Feb 03 '23 at 15:20

1 Answers1

0

According to PEP 471 - os.scandir() function, os.scandir will do what you want:

It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately.

It uses the system calls FindFirstFile / FindNextFile on Windows and readdir on POSIX systems to avoid loading the entire list of files into memory.

Will
  • 1,124
  • 12
  • 33