0

for file screening I have following code in two different directories:

import os, re

g=open('results_1.txt', 'w') #Other has 'results_2.txt'

for filename in os.listdir('.'):
    if filename.startswith("f"):
        with open(filename, 'r') as f:
            content =[line.rstrip() for line in f]

        A = filter(lambda x: 'KeyWord_1 :' in x, content)
        B = filter(lambda x: 'KeyWord_2 :' in x, content)

        print >> g,filename,

        for item in A:
            print >> g,item,
        for item in B:
            print >> g,item,

g.close()

Both directories have similar file (to be parsed my script) naming convention. So files look like this: file_1000.txt, file_100.txt, file_101.txt,.....,file_1.txt,......file_9.txt.

I change the script just to change the name of results file. But in one directory the files are sorted from _1 to _1000 and then results file has appropriate order while other does not. Why?

I am sorry this is related to my work and I can give any specifics.

P.S. I tried sorted function and it did not work as I wanted.

Trevor Hickey
  • 36,288
  • 32
  • 162
  • 271
algoProg
  • 718
  • 2
  • 11
  • 27
  • 1
    This is because `os.listdir` is unsorted. This answer might help: http://stackoverflow.com/questions/6773584/how-is-pythons-glob-glob-ordered/6773636#6773636 Also see this: http://stackoverflow.com/questions/4813061/nonalphanumeric-list-order-from-os-listdir-in-python – RomanK Oct 29 '16 at 16:25
  • To expand on @RomanK's comment, here is the official documentation for [`os.listdir`](https://docs.python.org/2/library/os.html#os.listdir) stating: *The list is in arbitrary order.* – UnholySheep Oct 29 '16 at 16:26
  • Okay, so if I understand it right, by arbitrary we should understand it as NOT DETERMINISTIC! So difference is in one directory the arbitrary order is assumed different than the other. This can be a problem. But okay. – algoProg Oct 29 '16 at 16:31
  • @algoProg: Just sort it. – Blender Oct 29 '16 at 16:32
  • Blender I will do that. In first attempt I did default sorting and it did not work. I need to do it as per the numbers in the file name. Thanks. – algoProg Oct 29 '16 at 16:33

1 Answers1

1

From the documentation on os.listdir:

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.

You need to sort the result using a preferred sort order. You vaguely point out that the resulting order wasn't as expected when you tried sorting it, which I take to mean that you probably do not want a lexicographical sort, but a numeric sort on the trailing numbers in the filename:

def trailing_number(filename):
    return int(filename.split('_')[1].rstrip('.txt'))

sorted(os.listdir('.'), key=trailing_number)

Adapt the above to handle the real format of your filenames. Also don't forget to handle exceptions in trailing_number which can arise if some of your filenames don't conform to the same format.

dkasak
  • 2,651
  • 17
  • 26