47

I have this line of code in my python script. It searches all the files in in a particular directory for * cycle *.log.

for searchedfile in glob.glob("*cycle*.log"):

This works perfectly, however when I run my script to a network location it does not search them in order and instead searches randomly.

Is there a way to force the code to search by date order?

This question has been asked for php but I am not sure of the differences.

Thanks

Jason Rogers
  • 667
  • 1
  • 6
  • 19
  • 1
    related: [Sorting files by date](http://stackoverflow.com/q/6759415/4279) – jfs May 02 '14 at 14:44
  • 1
    related: [How do you get a directory listing sorted by creation date in python?](http://stackoverflow.com/q/168409/4279) – jfs May 02 '14 at 14:45
  • 1
    related: [How to get file creation & modification date/times in Python?](http://stackoverflow.com/q/237079/4279) – jfs May 02 '14 at 14:47
  • Final code: `searchedfiles = glob.glob("*cycle*.log")` `searchedfiles.sort(key=os.path.getmtime)` `for searchedfile in searchedfiles:` – Jason Rogers May 02 '14 at 15:22

7 Answers7

116

To sort files by date:

import glob
import os

files = glob.glob("*cycle*.log")
files.sort(key=os.path.getmtime)
print("\n".join(files))

See also Sorting HOW TO.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • What happens if a file `x.txt` appears in the list of files returned by glob, but it changes its name to `y.txt` before `searchedfile` gets sorted? In this case we may get the following error: `File "C:\Program Files\Python310\lib\genericpath.py", line 65, in getctime return os.stat(filename).st_ctime FileNotFoundError: [WinError 2] The system cannot find the file specified`. Is there a way to avoid it? – Ido Jan 14 '23 at 16:10
  • @Ido yes, if you want to handle such race conditions, collect files and their creation time at the same time and/or drop files from the list, if you can't get their metadata e.g, drop files if `os.stat` is unsuccessful https://stackoverflow.com/a/539024 – jfs Jan 14 '23 at 16:24
15

Essentially the same as @jfs but in one line using sorted

import os,glob
searchedfiles = sorted(glob.glob("*cycle*.log"), key=os.path.getmtime)
Pablo Reyes
  • 3,073
  • 1
  • 20
  • 30
5

Well. The answer is nope. glob uses os.listdir which is described by:

"Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory."

So you are actually lucky that you got it sorted. You need to sort it yourself.

This works for me:

import glob
import os
import time

searchedfile = glob.glob("*.cpp")
files = sorted( searchedfile, key = lambda file: os.path.getctime(file))

for file in files:
 print("{} - {}".format(file, time.ctime(os.path.getctime(file))) )

Also note that this uses creation time, if you want to use modification time, the function used must be getmtime.

Jacob Budin
  • 9,753
  • 4
  • 32
  • 35
luk32
  • 15,812
  • 38
  • 62
1

If your paths are in sortable order then you can always sort them as strings (as others have already mentioned in their answers).

However, if your paths use a datetime format like %d.%m.%Y, it becomes a bit more involving. Since strptime does not support wildcards, we developed a module datetime-glob to parse the date/times from paths including wildcards.

Using datetime-glob, you could walk through the tree, list a directory, parse the date/times and sort them as tuples (date/time, path).

From the module's test cases:

import pathlib
import tempfile

import datetime_glob

def test_sort_listdir(self):
    with tempfile.TemporaryDirectory() as tempdir:
        pth = pathlib.Path(tempdir)
        (pth / 'some-description-20.3.2016.txt').write_text('tested')
        (pth / 'other-description-7.4.2016.txt').write_text('tested')
        (pth / 'yet-another-description-1.1.2016.txt').write_text('tested')

        matcher = datetime_glob.Matcher(pattern='*%-d.%-m.%Y.txt')
        subpths_matches = [(subpth, matcher.match(subpth.name)) for subpth in pth.iterdir()]
        dtimes_subpths = [(mtch.as_datetime(), subpth) for subpth, mtch in subpths_matches]

        subpths = [subpth for _, subpth in sorted(dtimes_subpths)]

        # yapf: disable
        expected = [
            pth / 'yet-another-description-1.1.2016.txt',
            pth / 'some-description-20.3.2016.txt',
            pth / 'other-description-7.4.2016.txt'
        ]
        # yapf: enable

        self.assertListEqual(subpths, expected)
marko.ristin
  • 643
  • 8
  • 6
1

One can do that now with just the pathlib module:

import pathlib
found = pathlib.Path.cwd().glob('*.py')
found = sorted(found,key=lambda file: pathlib.Path(file).lstat().st_mtime) 
0

Using glob no. Right now as you're using it, glob is storing all the files simultaneously in code and has no methods for organizing those files. If only the final result is important, you could use a second loop that checks the file's date and resorts based on that. If the parse order matters, glob is probably not the best way to do this.

Dylan Lawrence
  • 1,503
  • 10
  • 32
  • "glob is storing all the files simultaneously in code" what? – luk32 May 02 '14 at 14:30
  • @luk32 The [glob.glob()](https://docs.python.org/2/library/glob.html#glob.glob) code is loading the entire directory's set of files in to some sort of data structure. I said code because I do not know whether they're using a list or an internalized array. – Dylan Lawrence May 02 '14 at 14:32
  • 1
    It is still not the code. Also it doesn't matter. The problem is, that it is in arbitrary order because it gets it straight from the system. It depends on implementation of the kernel and file-system driver. – luk32 May 02 '14 at 14:39
0

You can sort the list of files that come back using os.path.getmtime or os.path.getctime. See this other SO answer and note the comments as well.

Community
  • 1
  • 1
Tom
  • 22,301
  • 5
  • 63
  • 96