1

I am working on an asp.net core mvc web application, the web application is a document management workflow. where inside each of the workflow steps users can upload documents, as follow:-

  1. users can upload documents with the following restriction; a file can not exceed 5 MB + all the documents inside a workflow can not exceed 50 MB, unless admin approves it. they can upload as many documents as they want.
  2. we will have lot of views which will show the step and all its documents attached to it, and users can chose to download the documents.
  3. we can have unlimited number of workflows. as the more users register with our application the more workflow will be created.
  4. certain files can be marked as confidential, so they should be encrypted when storing them either inside the database or inside the file system.
  5. we are planning to use EF core as the data access layer for our web application + SQL server 2016 or 2017.

now my question is how we should manage our files, where i found these 3 approaches.

  1. Blob.
  2. FileStream
  3. File system.

now the first approach, will allow us to encrypt the files inside the database + will work with EF. but it will have a huge drawback on performance, since opening a file or querying the files from database means they will be loaded inside the hosting server memory. so since we are seeking for an extensible approach, so i think this approach will not work for us since it is less scalable.

Second approach. will have better performance compared to first approach (Blob), but FileStream are not supported with EF + does not allow encryption. so we have to exclude this also.

third approach. of storing the files inside a folder which have the workflow ID + store the link to the file/folder inside the DB. will allow us to encrypt the files + will work with EF. and have a better performance compared to Blob (not sure if this is valid for FileStream). the only drawback, is that we can not achieve Atomic-ity between the files and their related records inside the database. but with adding some code we can handle this by our-self. for example deleting a database record will delete all its documents inside the folder, and we can add some background jobs to make sure all the documents have database records, other wise to delete the documents..

so based on the above i found that the third approach is the best fit for our need? so can anyone advice on this please? are my assumption correct? and is there a fourth appraoch or a hybrid appraoch that can be a better fit for us?

John John
  • 1
  • 72
  • 238
  • 501
  • Without the bounty his question would get closed as primarily opinion-based. – Gert Arnold May 25 '19 at 21:40
  • 1
    @GertArnold ok thanks for the note, but i do not think my question is that wide or opinion-based, as i am asking about the best approach taking into consideration many factors ... now if someone is asking a general question about the nest appraoch for managing files, i will consider it as an opinion-based, but in my case there are many factors which i am asking about them and what will be the best appraoch to follow... – John John May 26 '19 at 00:31

1 Answers1

-1

Although modern RDBMS have been optimised for data storage with the perks of integrity and atomicity, databases should be considered the least most alternative (StackOverflow posts like this and this shall corroborate the above) and therefore the third option mentioned or an improvement thereof shall be the vote.

For instance, a potential improvement would be to store the files renamed to a hash of the content and database the hash which shall eliminate all OS restrictions on subdirectories/files, filenames, and paths. Moreover, with a well structured directory layout duplicates could be filtered out.

The User-defined Database Functions shall aid in achieving atomicity which will efface the need of background jobs. An excellent guide on UDFs particularly for the use of accessing filesytem and invoking an executable can be found here.

Ashen Gunaratne
  • 435
  • 3
  • 9
  • You're completely ignoring filestream which I consider the best alternative for non-Azure SQL Server databases. You may not agree with that, fine, but you should at least discuss it, as it's one of the options. – Gert Arnold May 25 '19 at 21:41
  • @Ashen Gunaratne ok thanks for the reply, so you think using the file system to store the files will be the most appropriate option? – John John May 26 '19 at 00:25
  • @GertArnold Yes I also thought that FileStream is the best option to follow , but since i am using EF core + I need to encrypt certain files (as a second level of security), so i found that filestream will not be the best option to follow, as seems fileStream's files can not be encrypted + EF core does not support filestreams.. – John John May 26 '19 at 00:28
  • @GertArnold Filestream does hold a strong candidacy but as _test test_ pointed out for the current scenario it would be a [no-go](https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/bb933993(v=sql.105)#integrated-security)! – Ashen Gunaratne May 26 '19 at 11:53
  • @testtest Yes, in my opinion it would be the most eligent option taking into account all the facets! – Ashen Gunaratne May 26 '19 at 11:54
  • @testtest Why wouldn't EF core support filestream? As for EF, it just stores `byte[]` properties oblivious of the store implementation. And of course you can store encrypted files if you want. Filestream data can even be stored on encrypted volumes. – Gert Arnold May 26 '19 at 19:38
  • @GertArnold here is an official MS documentation mentioning that you can not encrypt filestream files https://learn.microsoft.com/en-us/sql/relational-databases/blob/filestream-sql-server?view=sql-server-2017 `Encryption is not supported on FILESTREAM data.` + here is a link mentioning that EF does not support filestreams https://github.com/aspnet/EntityFrameworkCore/issues/4983 `We currently don't have support for filestream` – John John May 28 '19 at 01:42
  • The first link only says that *streaming access* isn't supported. I'm not sure what the second link says, probably that SQL Server's built-in encryption isn't supported. Of course you can encrypt files and store them. – Gert Arnold May 28 '19 at 07:46
  • @GertArnold i read many articles which says EF does not support FileStreams + if sql encryption can not be applied to filestream, and we need to do external encryption, then will we be able to query this encrypted data + backup encrypted filestreams? – John John May 29 '19 at 14:51
  • Funny, I use EF + filestream each day. – Gert Arnold May 29 '19 at 21:01