I have a directory that is full of potentially millions of files. These files "mark" themselves when used, and then my Python program wants to find the "marked" ones then record that they were marked and unmark them. They are individual html files so they can't easily communicate with the python program themselves during this marking process (the user will just open whatever ones they choose).
Because they are marked when used, if I access them by modification date, one at a time, once I reach one that isn't marked I can stop (or at least once I get to one that was modified a decent amount of time in the future). However, all ways I've seen of doing this so far require accessing every file's metadata at least once, and then sorting this data, which isn't ideal with the magnitude of files I have. Note that this check occurs during an update step which occurs every 5 seconds or so combined with other work and so the time ideally needs to be independent of the number of files in the directory.
So is there a way to traverse a directory in order of modification date without visiting all files's medatada's at least once in Python?