-1

I am working on an app that allows people to upload data via pdf files. After reading the pdf with my app, i also like to store all characters in the pdf from the first page to the last page.

My fear is that a pdf file can be up to 80mb which can contain over 1 billion characters. Can mysql handle such large amount of characters?

Uchenna
  • 4,059
  • 6
  • 40
  • 73
  • This had been asked before: http://stackoverflow.com/questions/6766781/maximum-length-for-mysql-type-text – Peter M Oct 27 '13 at 11:25
  • 3
    Just because MySQL *can* store the data doesn't mean that it *should*. Why don't you store your files in that highly optimised file storage database, your filesystem? THen *relate* to relevant MySQL records (a *relational* database management system) with a suitable key into that file storage database (i.e. the file's path). – eggyal Oct 27 '13 at 11:29
  • Eggyal has VERY valid point – Jeroen Oct 27 '13 at 11:33
  • This would have been an option but i want searching the pdf file to b a lot more faster so i dont have too open the pdf with my app every time. – Uchenna Oct 27 '13 at 11:46
  • @UchennaOkafor: If you're trying to search the binary content of PDF files (least of all huge ones) for character strings using MySQL's pattern matching operators, you're just begging to be subjected to a whole world of pain. Good luck with that. – eggyal Oct 27 '13 at 11:58
  • @eggyal I am not going to use mysql matching operators. i only want to convert the characters to utf8 and index them using apache solr or shipnex to reduce load on the server during bg operations. – Uchenna Oct 27 '13 at 12:02
  • @UchennaOkafor: You can index files which are stored outside of MySQL, e.g. using [Solr Cell](http://wiki.apache.org/solr/ExtractingRequestHandler) (which will even parse PDF files correctly for you in order to properly extract the searchable content). Likewise Sphinx. – eggyal Oct 27 '13 at 12:22

2 Answers2

1

MySQL data storage requirements can be found here: MySQL5 storage requirements

There I find this table (L = length of string):

TINYBLOB, TINYTEXT         L + 1 bytes, where L < 2^8 = 256b
BLOB, TEXT                 L + 2 bytes, where L < 2^16 = 65.536 = 65kb
MEDIUMBLOB, MEDIUMTEXT     L + 3 bytes, where L < 2^24 = 16.777.216 = 16mb
LONGBLOB, LONGTEXT         L + 4 bytes, where L < 2^32 = 4.294.967.296 = 4.3gb

So for 80Mb page, you need a LONGTEXT. For PDF I would advice a LONGBLOB type, since this is binary format.

For the record: Eggyal has a point that it is better NOT to store this PDF in the database, but on disk. So I would advice on no doing it via the database, if you really need to put it in MySQL use a LONGBLOB

Jeroen
  • 982
  • 1
  • 6
  • 14
0

Check this link: http://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html

TINYTEXT    256 bytes    
TEXT        65,535 bytes        ~64kb
MEDIUMTEXT  16,777,215 bytes    ~16MB
LONGTEXT    4,294,967,295 bytes ~4GB
kmatyaszek
  • 19,016
  • 9
  • 60
  • 65