2

Given the following file structure,

├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.md5
├── v1
│   ├── content
│   │   ├── foo.xml
│   │   └── level1
│   │       └── level2
│   │           └── bar.txt
│   ├── inventory.json
│   └── inventory.json.md5
└── v2
    ├── content
    │   └── duck.txt
    ├── inventory.json
    └── inventory.json.md5

I'm wondering if it's possible that python's os.walk function returns the folders in different orders on Mac and Linux? Both are using python 3.5.

Mac:

In [15]: for root,folders,files in os.walk('foo/bar'): 
    ...:     print(folders,files) 
    ...:                                                                                                                                                                                                                                                                                   
['v1', 'v2'] ['inventory.json', '0=ocfl_object_1.0', 'inventory.json.md5']
['content'] ['inventory.json', 'inventory.json.md5']
['level1'] ['foo.xml']
['level2'] []
[] ['bar.txt']
['content'] ['inventory.json', 'inventory.json.md5']
[] ['duck.txt']

On Linux:

In [54]: for root,folders,files in os.walk('foo/bar'): 
    ...:     print(folders,files) 
    ...:                                                                                                                                                                                                                                                                                   
['v2', 'v1'] ['inventory.json.md5', 'inventory.json', '0=ocfl_object_1.0']
['content'] ['inventory.json.md5', 'inventory.json']
[] ['duck.txt']
['content'] ['inventory.json.md5', 'inventory.json']
['level1'] ['foo.xml']
['level2'] []
[] ['bar.txt']

In the case of Mac, looks as though the folder v1 is encountered first, while on Linux it's v2. Any insight as to why this might be the case?

ghukill
  • 1,136
  • 17
  • 42
  • Possible duplicate of [Non-alphanumeric list order from os.listdir()](https://stackoverflow.com/questions/4813061/non-alphanumeric-list-order-from-os-listdir) – Endyd Feb 19 '19 at 19:34
  • The order depends on what the OS returns, which is arbitrary, even using the same OS on different computers. If you want them in a certain order, it's up to you to do the sorting. – martineau Feb 19 '19 at 19:37

1 Answers1

8

See the documentation on os.walk, relevant part:

Changed in version 3.5: This function now calls os.scandir() instead of os.listdir(), making it faster by reducing the number of calls to os.stat().

And then in os.scandir():

Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries '.' and '..' are not included.

Regardless of listdir() or scandir(), both are returned in arbitrary order anyhow.

In short - order is not to be expected.


Having said that, you should be able to manipulate the dirnames in the loop based on this part:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.

So if you folders.sort() it should work based on your sorted order. I just tried it and it works. I've also bolded the key part in-place - the folders must be sorted in place for os.walk() to take the order:

for root,folders,files in os.walk('foo/bar'): 
    folders.sort()   # <--- sort your folders to impose the order. 
    print(folders,files) 
r.ook
  • 13,466
  • 2
  • 22
  • 39
  • The additional info with `topdown` is really useful. I've used `os.walk` many times and never have read that, so `topdown` used to be relatively meaningless to me. I must have been lucky when modifying the `dirs` variable. – Daniel F Feb 19 '19 at 19:42
  • Actually just learned this while answer this question as well. I always knew it was in arbitrary order but didn't know you could manipulate the order in place to impose the order. Guess OP got their wishes :) – r.ook Feb 19 '19 at 19:50
  • Thanks @Idlehands, this not only answers my question, but some helpful commentary on sorting in place. – ghukill Feb 19 '19 at 20:26