3

We store documents in an SQL Server 2005 database table with a column format of "Image".

Every time I try to store a PDF file which is greater than 1 Meg, it somehow gets corrupted.

Is there any particularly efficient method in .NET to serialize and store large files (~10megs) into a database?

[Edit] Microsoft actually says the max file size is about 2G Variable-length binary data from 0 through 2^31-1 (2,147,483,647) bytes. http://msdn.microsoft.com/en-us/library/ms187993.aspx

Soni Ali
  • 18,464
  • 16
  • 44
  • 53
  • 1
    Might the corruption have something to do with how you're putting it in the database and retrieving it? – Joel B Fant Jul 31 '09 at 14:30
  • 2
    Please show the code you use to store the files and retrieve them. Also, please tell us something about how you know there is corruption. Can you reproduce the corruption using test data and not files: if you store 20MB of zeros, can you retrieve 20MB of zeros. – John Saunders Jul 31 '09 at 14:35
  • The funny thing is my Unit tests still pass. I serialize the file, get the content size (byte length) save it, Retrieve the file and compare the bytes and they are the same. But when I try to view the retrieved file, it is corrupted. ... This happens only for large files. – Soni Ali Jul 31 '09 at 14:35
  • I don't have the code right now (in the metro), I'm going to an appointment. Its the standard C# FileStream bytes[] stuff. – Soni Ali Jul 31 '09 at 14:37
  • unit tests don't write to the database. Or the file system – azheglov Jul 31 '09 at 14:41
  • "unit tests don't write to the database. Or the file system" What! – Soni Ali Jul 31 '09 at 15:19

4 Answers4

8

Wouldn't it make more sense to store the file in the file system, and store a reference to the file in the database? storing 20MB files in the database sounds like it could cause some performace issues.

Austin Salonen
  • 49,173
  • 15
  • 109
  • 139
andrewWinn
  • 1,786
  • 2
  • 14
  • 28
  • I inherited this project, and they have always stored documents in the database. – Soni Ali Jul 31 '09 at 14:39
  • Would it be possible to convert the column into a BLOB? My SQL server is rusty, but as stated above, the limitions of varbinary might be overcome by using the Binary Object rather than an image column. – andrewWinn Jul 31 '09 at 14:42
3

Follow the document centric approach and not store documents like pdf's and images in the database, you will eventually have to refactor it out when you start seeing all kinds of performance issues with your database. Just store the file on the file system and have the path inside a table of your databse.

however, you may already be dealing this issue so take a look at this article.

http://chiragrdarji.wordpress.com/2007/08/31/storing-and-retrieving-docpdfxls-files-in-sql-server/

It explains that max limit of varbinarary is 2 raised to 31 bytes is what you can store in sql server 2005.

Srikar Doddi
  • 15,499
  • 15
  • 65
  • 106
2

A quick solution if you don't want to refactor your documents out of the db at this point is to use compression on your data:

http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx

i doubt that Sql Server is corrupting your files but working in the gzip algorithm might help you find a bug.

Paul Sasik
  • 79,492
  • 20
  • 149
  • 189
2

Shouldn't you be using VARBINARY(Max) for large binary data in SQL Server 2005, anyway?

Dan Diplo
  • 25,076
  • 4
  • 67
  • 89
  • Yes, the `IMAGE` type will be deprecated soon - SQL Server 2008 R2 probably won't support it anymore. Use `VARBINARY(MAX)` instead! – marc_s Jul 31 '09 at 15:25