2

How can I find files with filtering according to being newer than some query date in Python without crawling the entire search directory? E.g., in Bash/*nix (tested on MacOS) I can do find . -newermt '2018-01-17 03:28:46', which quickly searches for the files just newer than the specified query date. In Python I can do:

import os
import datetime

query_date = datetime.datetime.fromtimestamp(int(float(1516188526532974000)/1000000000))
results = []
for root, dirs, files in os.walk('/Users/Nafty/Sync/sxs'):
    for filename in files:
        path = os.path.join(root, filename)
        file_mtime = datetime.datetime.fromtimestamp(os.stat(path).st_mtime)
        if(file_mtime > query_date):
            results.append(path)  # yield path?

return results

However, this takes longer and seems to walk through the entire directory regardless.

Is there a way to do a fast search version of date-filtered directory crawling in Python, similar to the Bash example?

1 Answers1

1

The code you provided seems to be the way to do it in pure python. If speed is of such great importance for you might want to consider running the bash command you mentioned from your python code and then parse the output. The following code can be used:

import subprocess
timestamp = '"2018-01-17 03:28:46"'
path = '.'
files = []
find = subprocess.Popen('find ' + path + ' -newermt ' + timestamp, shell=True, 
stdout=subprocess.PIPE)
for line in find.stdout:
   files.append(line.decode('UTF-8').strip())
print(files)
dietapete
  • 38
  • 1
  • 6
  • Thanks. I would do it with `-print0` and then `.split('\0')`, by the way, if following this technique. Still wondering if there is a way in Python directly to do it at the same or similar speed. – Daniel Naftalovich Feb 26 '18 at 21:08