0

I have files which are labeled file1, file2, file3,... But to every file there are additional files which are called file1_1, file2_1, file2_2,...

Now, I want to iterate over all files in their corresponding order, so file1, file1_1, file2, file2_1, ...

import glob

for iter in range(number_of_files): 
    find_matching_files = glob.glob(file_directory + '\file' + str(iter+1) + "*")
    

The problem is that now file1 and file10 are listed in successive order. If I erase the "*" the additional files are excluded. Is there any smart way to do this?

Maxim
  • 231
  • 3
  • 16

2 Answers2

1

You can use regex to extract all the number parts of the file name. Then using Python's lexicographical ordering of sequences, we can easily compare the numbers of a file to another in a way that file1_1 [1, 1] is lesser than file1_2 [1, 2] which in turn is less than file10_1 [10, 1] which in turn is less than file10_1_1 [10, 1, 1].

import re
files = ["file1", "file2", "file1_1", "file1_2_2", "file10_2", "file10", "file2_3_1", "file2_1", "file1_2", "file2_3", "file1_2_1", "file2_2"]
files_sorted = sorted(files, key=lambda value: tuple(map(int, re.findall(r"\d+", value))))
print(files_sorted)

Output:

['file1', 'file1_1', 'file1_2', 'file1_2_1', 'file1_2_2', 'file2', 'file2_1', 'file2_2', 'file2_3', 'file2_3_1', 'file10', 'file10_2']
1

You can try combining exclude ! with range matching [] like this:

import glob

for iter in range(number_of_files): 
    find_matching_files = glob.glob(file_directory + '\file' + str(iter+1) + '[!0-9]' + "*")

But the order of matched files in find_matching_files with each loop is not guaranteed, so you may need to sort it yourself after each match.

Cuong Vu
  • 3,423
  • 14
  • 16
  • That's exactly what I was looking for. Maybe I wrote my question poorly. The files file1, file2 needed to be sorted, but file1_1 or file1_2 didn't needed to be in order, they just needed to be by file1 and before file2. – Maxim Sep 16 '21 at 15:58