0

Would it be possible to store an ext3-like system in mysql? For example, to build a file metadata system that would include paths and permissions? If so, what would be an example format of it? I suppose the starting denormalized route would be here:

-rw-r--r--@  1 david  staff   552B Nov  2 15:55 wsgi.py
-rw-r--r--   1 david  staff   113B Oct 30 20:45 ~.bash_profile

So we would have 7 columns:

permissions | links | user | group | size | date | filename

I suppose we would need an additional two for 'management':

fileId | directoryId

Does this seem way off base, or would that be possible? What would be a better and more normalized way to store it?

  • Are you asking to store/manage the result of a directory listing, or are you asking if one could replace ext3 by a sql database? – Solarflare Nov 17 '18 at 03:04
  • @Solarflare I mean just the directory metadata -- where the files are located and what permissions they have. –  Nov 17 '18 at 04:56
  • Sorry, I am still not really sure what you are looking for. Do you want us to add all the information to your erd that we think should be part of the metadata of a filesystem, or are you asking if the data you currently chose is normalized? I'd struggle e.g. with "links", as this should/could be calculated if you store/retrieve those links with your directory listing (which is not clear if you want to do that); also you probably need some hierarchy model for your directory, see e.g. [here](https://stackoverflow.com/questions/20215744/how-to-create-a-mysql-hierarchical-recursive-query). – Solarflare Nov 17 '18 at 11:26

1 Answers1

0

Sure, you could crawl the filesystem and collect meta info about each file and directory, and then store it into a database. But that info cannot be a substitute for the filesystem's info, only a 'copy' of it.

I would have (at least) 2 tables: Files and Directories. A file would have a dir_id. A directory would have both a dir_id for itself and a parent_id for walking up the directory tree. The top of the tree (the "root") would be a zero or null.

Soft links, hard links, devices, mounts, /proc, etc, would add challenges, but maybe you don't care about them?

size needs to be a BIGINT. permissions, if encoded, might fit in SMALLINT UNSIGNED, or could be stored as a string. user and group could be either the id or the string; you may need to have an id:name table for users and one for groups. For date, consider TIMESTAMP or maybe DATETIME; keep in mind that the OS may be doing something closer to TIMESTAMP to deal with timezones. (Windows is possibly different.)

If you are storing copies of the files, then I recommend using another table, linked to Files via file_id. But, beware, a LONGBLOB is limited to 4GB, and there are other settings that make it difficult to store anything bigger than 16MB. So, I might suggest chunking anything bigger than 64KB, compressing the chunks, etc. (This probably requires yet another table.)

As for reconstructing the full path from the "hierarchy" that I suggested above, it is only a small amount of code. It could be done in your app code, or in a Stored Procedure. With MySQL 8.0 or MariaDB 10.2, "CTEs" are available to facilitate drilling down a tree.

(Yes, I have done most of these things in a couple of projects in the past.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • thanks for this response, it's very helpful. Out of curiosity, why would you need to separate `Files` from `Directories` into two tables? For example, couldn't you store it in one table like `Entity`, or something similar? –  Nov 18 '18 at 01:30
  • @DavidL - You have a point. It depends on what info you need. In my projects, I needed enough differences to warrant two tables. With a single table, you probably need a flag saying directory vs file. – Rick James Nov 18 '18 at 01:44