0

One of my folders has mostly json files, and I'm reading the data they contain to do some classification for an SVM. A question I had was based on this code:

    for filename in os.listdir(os.getcwd()):
       if re.search('.json$',filename):
            try:
               with open(filename) as json_data:
                   print filename

Each time I pipe the output, I find that the filenames always get printed in the same order, like so:

    95231464576.json
    131777220274261.json
    17151210249.json
    122624927762214.json
    159287900855286.json
    155273941171682.json
    5265971983.json
    169635939813776.json
    159429967503904.json
    169114363192327.json
    170797436313930.json
    155963124522916.json

There are a few text files, and some python files in this directory.
My question here is: what determines the order in which these files are printed? Does the for loop have a way of looking for files?
I tried examining whether this order is based on size (max to min or min to max) or last modified(I had no reason for these tests,I just tried them since I can't think of any other insight).
I tried this snippet 4 times, and the order is the same each time.

I have a labelled classes in different folders, so if I can be assured of the order it would be helpful in the labeling for my training set(I don't know how good an idea this is).

tvishwa107
  • 301
  • 2
  • 14
  • 3
    This question appears to be similar to http://stackoverflow.com/questions/6773584/how-is-pythons-glob-glob-ordered/6773636#6773636 It is likely that listdir is just returning the list in the order that your base operating system is returning the list in – Brian Cain Jul 14 '15 at 15:35

1 Answers1

2

The order is not defined, and depends on the filesystem.

I remember reading, many years ago, that one of the improvements of ext3 over ext2 is keeping the pointer in the directory listing and beginning the next operation on that entry. Often a program will stat() then open() an entry, so scanning from the beginning of the (internal) list would occur twice for ext2; with ext3 the second operation would already be on the desired entry making the search for it very fast. This is significant with many files in a directory.

The point being that listing the directory will begin the list of entries whereever that pointer happened to be. Also, the order in which entries are created may affect the order. The ls program performs a sort operation before producing output so that it is visually consistent and usable.

dsh
  • 12,037
  • 3
  • 33
  • 51