0

I'm using os.walk with a generator to populate a set of filenames for later manipulation using the following:

file_path = '/home/user/Developer/10/'
list_of_files = {}
cnt = 0
for (dirpath, dirnames, filenames) in os.walk(file_path):
    for filename in filenames:
        if filename.endswith('.xml'):
            list_of_files[cnt] = os.sep.join( [dirpath, filename] )
            cnt += 1

With list_of_files sorted as:

{0: '/home/user/Developer/10/2/test/channe 1_UTC_DEtoSE_183126.585.xml',
 1: '/home/user/Developer/10/2/test/channe 1_UTC_DEtoSE_183216.572.xml',
 2: '/home/user/Developer/10/2/test/channe 1_UTC_DEtoSE_183123.015.xml',
 3: '/home/user/Developer/10/2/test/channe 1_UTC_DEtoSE_183058.016.xml',
 4: '/home/user/Developer/10/2/test/channe 1_UTC_DEtoSE_183130.151.xml',
 5: '/home/user/Developer/10/2/test/channe 1_UTC_DEtoSE_183140.873.xml',
 6: '/home/user/Developer/10/2/test/channe 1_UTC_DEtoSE_183223.729.xml',
 7: '/home/user/Developer/10/2/test/channe 1_UTC_DEtoSE_183054.451.xml',
 8: '/home/user/Developer/10/2/test/channe 1_UTC_DEtoSE_183148.014.xml',
 9: '/home/user/Developer/10/2/test/channe 1_UTC_DEtoSE_183202.296.xml'}

I know that python does not sort filenames when populating lists, but I was under the impression that sets were self-sorting? If not how can I sort this set alphanumerically by filename? If I use sorted() if returns a list object with set element numbers which is pretty useless.

martineau
  • 119,623
  • 25
  • 170
  • 301
osprey
  • 708
  • 5
  • 15
  • 2
    You're not using a set here, you're using a dict. Dicts are iterated in insertion order in recent Python versions, in arbitrary order based on element hash values in previous versions. If you were using a set, you'd get an arbitrary order with any Python version. The internal details of these datatypes are optimized for fastest possible operation; producing output in sorted order would severely slow them down. – jasonharper Apr 23 '21 at 18:58
  • That is not a set, but a dictionary, which is inherently unordered by default, or ordered as values are inserted. (but sets are unordered as well). You have to convert it into an other object type (list, np.array, dataframe etc.), which is meant to be ordered. Or just change the code to collect the file names into something else, than a dictionary. See this, for additional info: https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value – Betelgeux Apr 23 '21 at 19:00
  • Thanks to both for pointing out my misunderstanding re dict vs set. – osprey Apr 23 '21 at 19:10
  • also sets are NOT self sorting - a set has no specific order. – Tony Suffolk 66 Apr 23 '21 at 19:24

3 Answers3

1

I'm not sure if snippet below would be of some help; basically, you sort by the dictionary's values (not key).

for v in sorted(list_of_files.values()):
    print(v)
hspark
  • 11
  • 2
0

As comments pointed out, list_of_files is a dictionary, not a set. Changing list_of_files to be initialized as an empty list and using append() such as:

file_path = '/home/user/Developer/10/'
list_of_files = []
for (dirpath, dirnames, filenames) in os.walk(file_path):
    for filename in filenames:
        if filename.endswith('.xml'):
            list_of_files.append( os.sep.join( [dirpath, filename] ) )

does the trick.

Thanks for the helpful, timely comments!

osprey
  • 708
  • 5
  • 15
0

you can use the pathlib module to extract all the XML files with just one dict comprehension.

from pathlib import Path

file_path = Path('./home/user/Developer/10/')
list_of_files = {
    index: str(xml_file.absolute())
    for index, xml_file in enumerate(file_path.glob('**/*.xml'))
}

list_of_files  = dict(sorted(list_of_files.items(),key=lambda x:x[0])) # sort dict based on values
Nk03
  • 14,699
  • 2
  • 8
  • 22