Django BinaryField - Justification for statement in documentation

Question

In the Django 1.10 documentation for the BinaryField field type, they give a warning about its use:

Abusing BinaryField

Although you might think about storing files in the database, consider that it is bad design in 99% of the cases. This field is not a replacement for proper static files handling.

It does not continue with any justification for this claim. Are there any generalized indicators for what falls in the 99% "bad design" or 1% "not bad design" cases? Does this ring particularly true with Django because it has great static files support?

Possible duplicate of [Slowness found when base 64 image select and encode from database](http://stackoverflow.com/questions/41228496/slowness-found-when-base-64-image-select-and-encode-from-database) — e4c5, Jan 31 '17 at 06:12

score 3 · Accepted Answer · answered Nov 03 '17 at 12:24

I consider this premature optimization at best and cargo cult programming at worst.

While it is true that relational database systems aren't optimized for storing large fields (whether binary or text) and some of them treat them specially or at least have some restrictions on their use, most of them handle at least moderately sized binary values (let's say up to a few hundred megabytes) quite well. Storing pictures or PDFs in the database will be less efficient than storing them in the file system, but for 99% of all applications it will be efficient enough.

On the other hand, if you store these files in the file system, you lose several advantages:

Updates will be outside of transactions, so you can't be sure that an update to the file (in the filesystem) and the metadata (in the database) will be atomic.
You lose referential integrity: Your database may refer to files which have been deleted or renamed.
You have two different places where you store your data. This complicates access, backups, etc.

I would try to store all data together which belongs logically together. Usually that means storing everything in the database. If this is not technically possible (e.g. because your files are too big - most RDBMS have a size limit on blobs) or because tests show that it is too slow or otherwise inconvenient, you can always optimize it later.

It's been a while. I'm going to mark this as the correct answer unless somebody can provide substantiated claims otherwise. — Julien, Jun 21 '18 at 16:02

gkr · Answer 2 · 2017-02-23T14:35:38.613

0

Django models are an abstraction for relational database. These excel in storing small amount of data with well defined format and relationship. They are optimised for fixed length row and low memory usage.

Is your data fixed length, smaller than 4Kb, and is not meant to be served by a webserver ? You are probably in the 1%.

edited Feb 23 '17 at 14:35

answered Feb 23 '17 at 11:19

gkr

469
7
11

Where are you getting the 4Kb figure from? Are you implying that JSONFields, ArrayFields, TextFields, and other non-fixed length fields are also not good to use? Can you provide a source or resource backing this up, or at least something for further reading? – Julien Feb 23 '17 at 16:27
4Kb is arbitrary, taken from the demo scene. A text field that long would be a short blog post, for comparison. And sorry, I don't have source on this one, only not so good memory from my school year in computer science. Most modern database have tools for searching TextField and returning weighted result, so I assume these are fine. I wasn't curious enough yet to check how they are handled but I wouldn't be surprised if the row only store a pointer to the actual data. – gkr Feb 24 '17 at 17:19
I'm asking specifically for justification in this question, not just hearsay (which is about as much that the documentation gives). Also, if it's conceivable that there's only pointers stored in these rows, isn't it also conceivable that this holds true for `BinaryField` data as well? – Julien Feb 24 '17 at 20:22
BinaryField are stored as longblob on MySQL, which are treated the same as longtext which is the type used for TextField. According to the source cited in this question http://stackoverflow.com/questions/15585602/change-limit-for-mysql-row-size-too-large, MySQL will store the beginning of a blob or text in the row. Postgres will store only pointer to the data for text and char and varchar if they are too long. – gkr Feb 24 '17 at 22:15
Source about the Postgres bit? – Julien Feb 24 '17 at 22:41
https://www.postgresql.org/docs/9.1/static/datatype-character.html 6th paragraph mention very long value stored in background table. But the same page mention there is no downside to using text instead of char. – gkr Feb 25 '17 at 11:59

Django BinaryField - Justification for statement in documentation

2 Answers2