1

I am working on a system which will store user's picture and in the future some soft documents as well.

  • Number of users: 4000+
  • Transcripts and other documents per user: 10 MB
  • Total system requirement in first year: 40 GB
  • Additional Increment Each year: 10%
  • Reduction due to archiving Each year: 10%
  • Saving locally on Ubuntu Linux system without any fancy RAIDS.
  • Using MySQL community edition for application.
  • Simultaneous Users: 10 to 20
  • Documents are for historical purposes and will not be accessed frequently.

I always thought it is cumbersome to store in a RDBMS due to the multiple layers to access etc. However, since we use key/value pair in nonRDBMS databases, is it still better to store the documents in file system or DB? Thanks for any pointers.

Similar question was asked about 7 years ago (storing uploaded photos and documents - filesystem vs database blob)!. I hope there was some change in the technology with all NoSQL databases in the spin. Hence, I am asking this again.

Please correct me if I should be doing something else instead of raising a fresh question.

Community
  • 1
  • 1
SriSri
  • 393
  • 4
  • 10
  • How many users? How many pictures and documents? What is the typical size of pictures and documents? How would they be accessed? What is the total data size? Number of simultaneous connections? On what computer, operating system, hardware? (a single VPS, a desktop workstation, or a datacenter).. Please **edit your question to improve it** (otherwise it stays too broad) – Basile Starynkevitch Jan 24 '16 at 08:43
  • What's the benefit of storing images in a place where you have to write code to just look at them or evem just get their dimensions? What's the benefit of having them in one big amorphous lump the whole of which you'll have to back up? Sorry, I just don't buy using a database for storing image files - I'm going to go with a filesystem every time. – Mark Setchell Jan 24 '16 at 08:59
  • For *small* images of one kilobyte or less, the filesystem diskspace and inode overhead might be unacceptable. – Basile Starynkevitch Jan 24 '16 at 09:06
  • Added more information requested by Basile. – SriSri Jan 25 '16 at 12:50

1 Answers1

0

It really depends (notably of the DBMS considered, of the file system, is it remote or local, total size of data -petabytes is not the same as gigabytes-, numbers of users/documents etc.).

If the data is remote on a 1Gb/s Ethernet the network is the bottleneck. So using a DBMS won't add significant additional overhead. See the answers section of this interesting webpage, or STFW for Approximate timing for various operations on a typical PC...

If the data is local, things matter much more (but few computers have one petabyte of SATA disks). Most filesystems on Linux use some minimal block size (e.g. 1Kbytes, 4Kbytes, ...) per file.

A possible approach might be to have some threshold (typically 4 or 8kilobytes, or even perhaps 64kilobytes, that is several pages; YMMV). Data smaller than it could be directly a field in a database, data bigger than it could be in a file. The database might sometimes contain file path for the data. Read about BLOBs in databases.

Consider not only RDBMS like PostGreSQL, but also noSQL solutions à la MongoDB, and key-value stores à la REDIS, etc.

For a local data approach, consider not only plain files, but also sqlite & GDBM, etc. If you use a file system, consider avoiding very wide directories, so instead of having widedir/000001.jpg .... widedir/999999.jpg organise it as dir/subdir000/001.jpg ... dir/subdir999/999.jpg and have no more than a thousand entries per directory.

If you locally use a MySQL database, and don't consider a lot of data (e.g. less than a terabyte), you might store directly in the database any raw data smaller than e.g. 64Kbytes, and store bigger data in individual files (whose path is going into the database); but you still should avoid very wide directories for them.

Of course, don't forget to define and apply (human decided) backup procedures.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Basile, I updated my question. But, since you already gave the relevant information, I am marking it as answered. Please feel free to comment/edit if you would like to add more after checking my updates. – SriSri Jan 27 '16 at 10:47