1

I have a directory that is full of potentially millions of files. These files "mark" themselves when used, and then my Python program wants to find the "marked" ones then record that they were marked and unmark them. They are individual html files so they can't easily communicate with the python program themselves during this marking process (the user will just open whatever ones they choose).

Because they are marked when used, if I access them by modification date, one at a time, once I reach one that isn't marked I can stop (or at least once I get to one that was modified a decent amount of time in the future). However, all ways I've seen of doing this so far require accessing every file's metadata at least once, and then sorting this data, which isn't ideal with the magnitude of files I have. Note that this check occurs during an update step which occurs every 5 seconds or so combined with other work and so the time ideally needs to be independent of the number of files in the directory.

So is there a way to traverse a directory in order of modification date without visiting all files's medatada's at least once in Python?

Community
  • 1
  • 1
Phylliida
  • 4,217
  • 3
  • 22
  • 34

1 Answers1

2

No, I don't think there is a way to fetch file names in chunks sorted by modification dates.

You should use file system notifications to know about modified files.

For example use https://github.com/gorakhargosh/watchdog or https://github.com/seb-m/pyinotify/wiki

warvariuc
  • 57,116
  • 41
  • 173
  • 227
  • Awesome, that's actually a better solution than what I thought was possible (without using something hacky like websockets), thanks. – Phylliida Dec 16 '14 at 05:44
  • [See also](http://stackoverflow.com/questions/182197/how-do-i-watch-a-file-for-changes-using-python). Don't forget to upvote/accept useful answers – warvariuc Dec 16 '14 at 05:45