1

I'm trying to loop through files I have, and would like to put every two files in a pair, especially that every two files coming after each other are actually related.

I have the files sorted in my directory, and I used the following to loop through the directory and read the pairs of files:

for root, dirs, files in os.walk(TRAIN_DIR):
        for file1, file2 in itertools.izip_longest(files[::2], files[1::2]):

However, I receive file1 and file2 in different orders, and not those two files that should come immediately after each other as in the directory. Does os.walk then return unsorted files? What should I do in order to walk through the files in a sorted order?

Thanks.

EDIT 1

This is how I ran files.sort():

for root, dirs, files in os.walk(TRAIN_DIR):
            files.sort()
            for file1, file2 in itertools.izip_longest(files[::2], files[1::2]):
Simplicity
  • 47,404
  • 98
  • 256
  • 385
  • 3
    operating system doesn't guarantee to return dir entries sorted alhpanumerically. Besides, what do you mean "beginning with numbers" ? because if you have 1_xx, 2_yy, 10_zz, 10_zz will be inserted in between if you just use lexicographical sort. – Jean-François Fabre Feb 15 '18 at 07:49
  • See also https://stackoverflow.com/questions/6670029/can-i-force-python3s-os-walk-to-visit-directories-in-alphabetical-order-how – cdarke Feb 15 '18 at 07:52

1 Answers1

5

The order in which files are returned depends on the underlying file system. If you need to iterate over the file names in sorted order, it'd be best to do it yourself: sort files first, and then iterate.

for root, dirs, files in os.walk(TRAIN_DIR):
    files.sort(key=...)   # your predicate
    ...
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thanks for the answer. My files start with something like this: 0m3r2r12-....... What should "key" be in this case? – Simplicity Feb 15 '18 at 08:05
  • @Simplicity That's a weird format. How should it be sorted according to you? – cs95 Feb 15 '18 at 08:09
  • In the directory, I have them sorted out naturally by the system. Like for instance, 0a would come before 0m. Makes sense? – Simplicity Feb 15 '18 at 08:12
  • @Simplicity So, lexicographically? Do you need to account for double digit numbers? If yes, then you can look into the `natsort` module. Otherwise, just `file.sort()`, for lexicographical sorting, should be more than enough. – cs95 Feb 15 '18 at 08:13
  • Sure, I will try that out. For the "sort", should I place it after "izip_longest" in "for file1, file2 in itertools.izip_longest(files[::2], files[1::2]):"? – Simplicity Feb 15 '18 at 08:14
  • @Simplicity `files.sort()` works in-place. So, just call `files.sort()`, and then run your inner loop as usual in the next line. – cs95 Feb 15 '18 at 08:15
  • Please see my "edit" showing how I ran "files.sort()". Not sure if it correct that way as my files were still not sorted – Simplicity Feb 15 '18 at 08:20
  • 1
    @Simplicity Did you check that `files` was being sorted or not? If not, you may want to open another question with your input file names, asking how you can sort them as desired. Note that this only sorts the `files` list in your program, not the files as seen in your file system... – cs95 Feb 15 '18 at 08:23
  • cᴏʟᴅsᴘᴇᴇᴅ Yes, sure, I opened another question, here: https://stackoverflow.com/questions/48802835/reading-files-as-sorted-in-my-file-system – Simplicity Feb 15 '18 at 08:28