43

Lets say I have three files in a folder: file9.txt, file10.txt and file11.txt and i want to read them in this particular order. Can anyone help me with this?

Right now I am using the code

import glob, os
for infile in glob.glob(os.path.join( '*.txt')):
    print "Current File Being Processed is: " + infile

and it reads first file10.txt then file11.txt and then file9.txt.

Can someone help me how to get the right order?

Petter Friberg
  • 21,252
  • 9
  • 60
  • 109
user1620012
  • 433
  • 1
  • 4
  • 4

5 Answers5

97

Files on the filesystem are not sorted. You can sort the resulting filenames yourself using the sorted() function:

for infile in sorted(glob.glob('*.txt')):
    print "Current File Being Processed is: " + infile

Note that the os.path.join call in your code is a no-op; with only one argument it doesn't do anything but return that argument unaltered.

Note that your files will sort in alphabetical ordering, which puts 10 before 9. You can use a custom key function to improve the sorting:

import re
numbers = re.compile(r'(\d+)')
def numericalSort(value):
    parts = numbers.split(value)
    parts[1::2] = map(int, parts[1::2])
    return parts

 for infile in sorted(glob.glob('*.txt'), key=numericalSort):
    print "Current File Being Processed is: " + infile

The numericalSort function splits out any digits in a filename, turns it into an actual number, and returns the result for sorting:

>>> files = ['file9.txt', 'file10.txt', 'file11.txt', '32foo9.txt', '32foo10.txt']
>>> sorted(files)
['32foo10.txt', '32foo9.txt', 'file10.txt', 'file11.txt', 'file9.txt']
>>> sorted(files, key=numericalSort)
['32foo9.txt', '32foo10.txt', 'file9.txt', 'file10.txt', 'file11.txt']
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
12

You can wrap your glob.glob( ... ) expression inside a sorted( ... ) statement and sort the resulting list of files. Example:

for infile in sorted(glob.glob('*.txt')):

You can give sorted a comparison function or, better, use the key= ... argument to give it a custom key that is used for sorting.

Example:

There are the following files:

x/blub01.txt
x/blub02.txt
x/blub10.txt
x/blub03.txt
y/blub05.txt

The following code will produce the following output:

for filename in sorted(glob.glob('[xy]/*.txt')):
        print filename
# x/blub01.txt
# x/blub02.txt
# x/blub03.txt
# x/blub10.txt
# y/blub05.txt

Now with key function:

def key_func(x):
        return os.path.split(x)[-1]
for filename in sorted(glob.glob('[xy]/*.txt'), key=key_func):
        print filename
# x/blub01.txt
# x/blub02.txt
# x/blub03.txt
# y/blub05.txt
# x/blub10.txt

EDIT: Possibly this key function can sort your files:

pat=re.compile("(\d+)\D*$")
...
def key_func(x):
        mat=pat.search(os.path.split(x)[-1]) # match last group of digits
        if mat is None:
            return x
        return "{:>10}".format(mat.group(1)) # right align to 10 digits.

It sure can be improved, but I think you get the point. Paths without numbers will be left alone, paths with numbers will be converted to a string that is 10 digits wide and contains the number.

hochl
  • 12,524
  • 10
  • 53
  • 87
  • The sorted function does not change the order unfortunately. – user1620012 Aug 23 '12 at 14:42
  • It does -- `y/blub05.txt` moves from last position one up because `blub05.txt` comes before `blub10.txt`. Only the file name is compared without the directory in `key_func`. – hochl Aug 23 '12 at 14:44
  • Actually my files dont have the zeros. They are renamed as x/blub1.txt x/blub2.txt x/blub10.txt and this produces a wrong order, even with the sort command. x/blub3.txt y/blub5.txt – user1620012 Aug 23 '12 at 14:45
  • When sorting strings '1' comes before '9', which is why you are seeing this behavior. You can change key_func to isolate that number per my answer. – grieve Aug 23 '12 at 14:52
1

You need to change the sort from 'ASCIIBetical' to numeric by isolating the number in the filename. You can do that like so:

import re

def keyFunc(afilename):
    nondigits = re.compile("\D")
    return int(nondigits.sub("", afilename))

filenames = ["file10.txt", "file11.txt", "file9.txt"]

for x in sorted(filenames, key=keyFunc):
   print xcode here

Where you can set filenames with the result of glob.glob("*.txt");

Additinally the keyFunc function assumes the filename will have a number in it, and that the number is only in the filename. You can change that function to be as complex as you need to isolate the number you need to sort on.

grieve
  • 13,220
  • 10
  • 49
  • 61
  • What if there are files with different names, grouped with numbers? Ex. `foo1.txt`, `foo2.txt` .. `foo10.txt`, then `bar1.txt`, `bar2.txt`, etc? Or there are two sets of numbers in the filename? – Martijn Pieters Aug 23 '12 at 14:53
  • @MartijnPieters: That wasn't a requirement of the original question, and I think you know the answer. :) – grieve Aug 23 '12 at 15:08
  • Well, most likely the question used a small sample of files; as it turns out the `9`, `10`, `11` sequence was the crucial part. We cannot assume we have the whole picture here. :-) – Martijn Pieters Aug 23 '12 at 15:11
  • Agreed on the big picture part. I assume the person asking the question can translate from a specific answer to a more general answer if so needed. Hence I try to give a minimal but complete answer (based on their question), and hope they can succeed from there. – grieve Aug 23 '12 at 15:21
  • Generally speaking, I agree, but with regular expressions that doesn't always work out :-) – Martijn Pieters Aug 23 '12 at 15:22
  • Well regex is a topic worthy of an entire book. I certainly don't expect anyone to explain that in one SO question. :) But I do see your point. – grieve Aug 23 '12 at 15:32
  • If it is possible to give files names like file09.txt, file10.txt, file11.txt, the sorting should work without the need to specify custom parameters. – Fabio Capezzuoli Sep 17 '18 at 10:59
0
glob.glob(os.path.join( '*.txt'))

returns a list of strings, so you can easily sort the list using pythons sorted() function.

sorted(glob.glob(os.path.join( '*.txt')))
apparat
  • 1,930
  • 2
  • 21
  • 34
  • sorted function gives the same result Current File Being Processed is: file10.txt.txt Current File Being Processed is: file11.txt.txt Current File Being Processed is: file9.txt.txt – user1620012 Aug 23 '12 at 14:51
-5
for fname in ['file9.txt','file10.txt','file11.txt']:
   with open(fname) as f: # default open mode is for reading
      for line in f:
         # do something with line
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284