32

Given the piece of code:

from glob import glob, iglob

for fn in glob('/*'):
    print fn

print ''

for fn in iglob('/*'):
    print fn

Reading the documentation for glob I see that glob() returns a basic list of files and iglob an Iterator. However I'm able to iterate over both and the same list of files is returned by each of them.

I've read the documentation on Iterator but it hasn't shed anymore light on the subject really!

So what benefit does iglob() returning an Iterator provide me over the list from glob()? Do I gain extra functionality over my old friend the lowly list?

ghickman
  • 5,893
  • 9
  • 42
  • 51

2 Answers2

36

The difference is mentioned in the documentation itself:

Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

Basically list will have all the items in memory. Iterator need not, and hence it requires less memory.

amit
  • 10,612
  • 11
  • 61
  • 60
  • 10
    Just add that it is called 'lazy evaluation'. We don't do something until we don't need it. – ceth Nov 26 '10 at 17:00
  • 1
    Note: for a single directory the memory use is the same (due to current implementation via `os.listdir()`). The advantage is present if there are multiple directories with many files. – jfs Nov 26 '10 at 18:13
  • Like @J.F.Sebastian said, iglob speed/memory advantage over glob is hampered by os.listdir() (see [this](http://code.activestate.com/lists/python-list/184205/) ): this means that they will both be slow over directories with lots of files. If you have that problem, check out [formic](http://pypi.python.org/pypi/formic). Example [here](http://stackoverflow.com/a/10597254/633403). – Luca Invernizzi Aug 14 '12 at 19:09
  • 2
    @LucaInvernizzi: I've not mention speed at all. `glob` supports `**` too. It seems `formic` uses `os.walk` that uses `os.listdir()`. For the link you provided it is unclear where the bottleneck is filesystem or python. You could try [readdir()](http://stackoverflow.com/a/5091076/4279) or even [getdents()](http://stackoverflow.com/a/7032186/4279) to read millions of files at a single level – jfs Aug 14 '12 at 19:53
12

Adding to amit's answer. iglob() is useful in the particular case where if you delete a directory in the list, the files and folders in the list will be stored by glob() and hence further access in the loop throws exception. But by using iglob() we can overcome the concurrent modification exception

Ridhuvarshan
  • 185
  • 2
  • 10
  • I didn't get it can you please elaborate a bit. thanks! – Coddy Mar 18 '21 at 01:20
  • 3
    @Coddy, Assume you want to delete all directories and files that start with 'w' inside the folder test. glob() stores the paths of all directories and folders inside 'test'. Say there is a folder called 'willow' inside 'test' and there are files 'file1', 'file2' and 'wfile3'. When you use glob() and delete 'willow', and then try deleting 'wfile3', it will throw an exception. If you use iglob() the paths of files and directories will not be prestored. So, you wouldn't even go to 'wfile3; – Ridhuvarshan Mar 19 '21 at 02:10