2

I would like my program (written in Python) to monitor a given file system's hierarchy, record it into persistent data storage, and be able to update it when the file system changes. It might be read into volatile memory for quick access.

I've found some posts that suggested "the best persistent storage method to use with Python" here and here, as well as another post that answered "how to represent a filesystem in a relational database" here.

From the above links, it appears that SQLite is a good choice for persistence, as it is quick. However, I couldn't find much opinion on how good it is to use a database to store and represent filesystem hierarchy.

My considerations for the method implemented are:

  • Performance/Speed
  • Scalibality: I need to monitor and keep updated a hierarchy potentially up to hundreds of thousands of files
  • Ease of use: when I read the file system hierarchy into memory
  • Any other suggested considerations?

Is it a good idea to use a RDBMS to represent a filesystem hierarchy? What are the pros and cons in this method? Do you have other suggested methods, and what are the pros and cons of such methods?

Community
  • 1
  • 1
  • I'll just throw this out there, but take a look at graph databases. In my opinion there are three types of databases these days: relational, document, graph. (I suppose you could throw in key/value) With a graph database it's designed to store relations - where a file appears in the tree structure. Might make querying more natural as well. My hear hurts thinking of how to force a filesystem into a relational database. – ryan1234 Feb 07 '13 at 20:31
  • If you really care about performance, knowing what operations you're planning to perform on the filesystem structure is critical and will dictate the constraints of your design -- picking a data store is secondary to choosing the right data structures. – Dan Oct 24 '15 at 07:22
  • Side note: if you aren't 100% certain (because of measurements you took on a live system) that this is the bottleneck for your application, you should probably try using the filesystem directly first -- without knowing what you're trying to do, this sounds like premature optimization, and duplicating the filesystem structure will add a lot of unnecessary complexity to your design (because you'll have to keep the two copies in sync with each other). – Dan Oct 24 '15 at 07:22

0 Answers0