I have a bunch of PDF documents that I need to use in a website I am making. I do need to be able to search the PDFs as well. So is it better to save these files to the database or to a file folder? Also, in both cases how do I search them? I will basically be searching them for 1 or 2 words and return the list of PDFs that have the results in them. What is the best and easiest way to do all of this? Also, the PDF file will be changed once a year at the most often and sometimes even less often and I will not need to keep revision history.
-
4Duplicate of http://stackoverflow.com/questions/148568/storing-database-data-in-files http://stackoverflow.com/questions/2028683/best-way-storing-binary-or-image-files http://stackoverflow.com/questions/2262646/storing-images-in-filesystem-as-files-or-in-blob-database-field-as-binaries-closed http://stackoverflow.com/questions/1148122/fastest-way-to-retrieve-store-millions-of-small-binary-objects http://stackoverflow.com/questions/782655/ http://stackoverflow.com/questions/662488/ http://stackoverflow.com/questions/148568/ – Esteban Küber Mar 01 '10 at 16:16
-
I will be storing no more than 1000 documents. – Ben Hoffman Mar 01 '10 at 16:27
-
Voyager - This is different than all of those. My real concern is searching files and if SQL Server does a better job or if there is some way/a better way to do it via searching a folder of files. – Ben Hoffman Mar 01 '10 at 16:31
-
Extremely duplicated topic. Voting to close. – APC Mar 01 '10 at 16:32
-
3@voyager: None of these supposed duplicates addresses the searching requirement. – recursive Mar 01 '10 at 16:32
4 Answers
You can store the PDF inside of a table using a varbinary field and an extension field. Then you can take advantage of the Fulltext serch engine to search inside of the PDFs. You will have to install a PDF iFilter in your SQL server. I do not know if this is the easiest way to do it, but I know it works great. I am using that schema to store hundred of thousands of documents and it performs great.

- 2,948
- 17
- 22
This is the same argument over and over again about saving things in the file system vs saving them in the database. Sadly, there is no right or wrong answer, and it all depends on the scope of your project. Take a look at this stackoverflow question. It's about saving images in a DB, but it's the same principle.
As ppl say, I suppose that there are many advantages and disadvantages, in both ways, but if I´d had to take this decission, I definitely wouldn´t save pdf files in the database. I´m not talking only in terms related to efficiency... I´m thinking what would you do in the future if you´ll have to change your database engine, for example. I always try to get the most standard database types as possible. =)

- 4,051
- 2
- 22
- 20
It depends on how many files we are talking here.
I would probably make a database table where I map document information such as the name, a description, who uploaded it, etc. to a filename. I would not store the entire files in the database.
This way, you would need to synchronize the files on disk with the database so to speak. When someone deletes a file (using the web interface), remove the entry from the database and delete the file that was on disk.

- 25,711
- 35
- 110
- 162
-
Images are one thing and I understand. This deals more specifically with searching the documents though. Which is what I was wondering. I was not even sure if it was possible to search documents in a folder via a .NEt website. – Ben Hoffman Mar 01 '10 at 16:29