1

There are lots of questions about using python to determine whether files were modified within a directory (see [1], [2]). Invariably the answer to these questions involves walking through the directory (os.walk) to check things individually.

Is there any way to do this without a walk?

I have a large/deep directory structure and it is costly to check each subdirectory (and its subdirectories, recursively). I am wondering if this task can be accomplished only looking at a top level directory.

In this schema, the modification time of dir changes when subdir is created. But it does not change when file is created. The problem would be solved if modification times were affected by all child files and directories.

dir/
|--- subdir/
│   |------- file

A quick script to facilitate testing:

import os

os.system('rm -rf dir')

os.system('mkdir dir')
m1 = os.path.getmtime('dir')

os.system('mkdir dir/subdir')
m2 = os.path.getmtime('dir')

os.system('touch dir/subdir/file')
m3 = os.path.getmtime('dir')

print m1 == m2 # False
print m2 == m3 # True
Nolan Conaway
  • 2,639
  • 1
  • 26
  • 42
  • Do you need to just determine that some file from the root directory was modified (true or false) output, or do you need to determine the file that was modified as well? Will files/directories ever be created or deleted, or just modified? – Anil Vaitla Mar 11 '18 at 04:17
  • Thanks for asking! The most common scenario I am addressing requires only knowing if any file has been modified. Creations and deletions are more common than modifications. – Nolan Conaway Mar 11 '18 at 04:21
  • Would directories ever be created and deleted or files only? – Anil Vaitla Mar 11 '18 at 04:22
  • In my case both directories and files can be created at any time. So this is a general case where we want to know if _anything_ has been done within _any_ depth of subdirectory. – Nolan Conaway Mar 11 '18 at 04:35
  • What filesystem are you using? – Anil Vaitla Mar 11 '18 at 05:02
  • Linux can be assumed, though an approach general to mac/windows is appreciated. – Nolan Conaway Mar 11 '18 at 05:38

1 Answers1

1

If you will only use Linux you can consider using something based on inotify, specifically pyinotify.

inotify is hooked into the operating system so as the os makes file changes it triggers notify events into your application. However, you may potentially find challenges if you need to use a platform other than Linux and may need to use a different library. Here is more information about how inotify works.

If you need something that will work on Mac OS, Linux, and Windows you can also consider Watchdog. This uses the platform specific file watching facility:

  • Windows: FindFirstChangeNotification API
  • Mac OS: FS Events
  • Linux: inotify

This discussion may be similar to your question as well.

Anil Vaitla
  • 2,958
  • 22
  • 31
  • Nice idea! AFAICT the approach here would be to create a daemon that walks (https://github.com/seb-m/pyinotify/blob/master/python2/pyinotify.py#L2119) and triggers a process on update. This is slightly better than the current state of affairs but does not solve the base problem (detection _without walking_). – Nolan Conaway Mar 11 '18 at 04:01
  • Fundamentally there has to be at least 1 time to walk the full directory. This initial walk should set the initial modification watches in the directory. I wonder though if new file creations need to be detected by another full walk or they get triggered on inotify watches which don't require full directory traversal. – Anil Vaitla Mar 11 '18 at 04:14
  • There is one way to test this by patching the os.path.walk function in pyinotify and seeing that when new files are created a full walk is not triggered. – Anil Vaitla Mar 11 '18 at 04:15
  • Is this some basic CS principle that i do not know? If it is true that we need at least _one_ walk then the answer to my question is "you can't". – Nolan Conaway Mar 11 '18 at 04:36
  • 1
    My mistake, I shouldn't have said this is a "Fundamental" principle. There could be some system implementation which makes it possible to not need an initial walk, but for a system like inotify which sets watch events on individual files and directories there needs to be an initial walk (since it needs to visit and set a watch at least once). – Anil Vaitla Mar 11 '18 at 04:41