3

The needs would be long to describe, so I'll simplify the example.

I want to make a form creation system ( the user can create a form, adding fields, etc... ). Let's focus on checkbox vs textarea.

The checkbox can have a value of 0 or 1, depending on the checked status. The textarea must be a LONGTEXT type.

So in the database, that give me 3 choices concerning the structure of the table field_value:

1.

checkbox_value (TINYINT) | textarea_value (MEDIUMTEXT)

That mean that no input will ever use all column of the table. The table will waste some space.

2.

allfield_value (MEDIUMTEXT)

That mean that for the checkbox, I'll store a really tiny value in a MEDIUMTEXT, which is useless.

3.

tblcheckbox.value
tbltextarea.value

Now I have 1 separate table per field. That's optimal in terms of space, but in the whole context of the application, I might expect to have to read over 100 tables -- 1 query with a many JOIN ) in order to generate a single page that display a form.

In your opinion, what's the best way to proceed?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
FMaz008
  • 11,161
  • 19
  • 68
  • 100
  • If your main concern is extra overhead from having lots of unused longtext fields, you could always just put the longtext value in its own table and use a join column to link it back to its parent. This allows you to maintain the simplicity of the flat design in #1 without compromising on space. – Elk Jun 29 '11 at 15:14
  • @misterscruffles I only gave a small example, but in the whole picture, I could have to create more than 1 extra field. But maybe the solution #1 is still better than having an EAV model in terms of READING speed... but the table will be messy with all the field that will be required. – FMaz008 Jun 29 '11 at 15:28

2 Answers2

3

You might want to consider an EAV data model.

Joe Stefanelli
  • 132,803
  • 19
  • 237
  • 235
  • Hum, really interesting... Could you give me a concrete example of a mysql SELECT that might apply in my situation ? But I've read that I might face serious performance problem when it come to data extraction. – FMaz008 Jun 29 '11 at 15:23
  • ... and mysql have a limit of 61 join per row : http://stackoverflow.com/questions/1684971/too-many-tables-mysql-can-only-use-61-tables-in-a-join – FMaz008 Jun 29 '11 at 15:47
  • I've just given you a +1, but could you respond to Mr. Karwin? Because this seems like a fascinating potential debate. You both have really high rep, so I'm guessing you both have some good points. – cwallenpoole Jun 29 '11 at 15:50
  • 2
    @cwallenpoole: All of the pain points that Bill mentioned for EAV are valid. From my point of view, it can still be useful in certain situations. Among the advantages, I'd count the easy addition of new attributes without any DDL changes to the schema and reduced indexing/improved searching as all the values are stored/indexed in a single table. The advantages and disadvantages should be weighed carefully. In my 12+ years as a DBA, I've only implemented an EAV model once, but in that particular case, it has served us well. – Joe Stefanelli Jun 29 '11 at 16:14
3

Do not consider an EAV data model. It's easy to put data in, but hard to get data out. It doesn't scale. It has no data integrity. You have to write lots of code yourself to do things that any RDBMS does for you if you model your data properly. Trying to use an RDBMS to create a general-purpose form management system that can accommodate any future needs is an example of the Inner-Platform Effect antipattern.

(By the way, if you do use EAV, don't try to join all the attributes back into a single row. You already commented that MySQL has a limit on the number of joins per query, but even if you can live within that, it doesn't perform well. Just fetch an attribute per row, and sort it out in application code. Loop over the attribute rows you fetch from the database, and populate your object field by field. That means more code for you to write, but that's the price of Inner-Platform Effect.)

If you want to store form data relationally, each attribute would go in its own column. This means you need to design a custom table for your form (or actually set of tables if your forms support multivalue fields). Name the columns according to the meaning of each given form field, not something generic like "checkbox_value". Choose a data type according to the needs of the given form field, not a one-size-fits-all MEDIUMTEXT or VARCHAR(255).

If you want to store form data non-relationally, you have more flexibility. You can use a non-relational document store such as MongoDB or even Solr. You can store documents without having to design a schema as you would with a relational database. But you lose many of the structural benefits that a schema gives you. You end up writing more code to "discover" the fields of documents instead of being able to infer the structure from the schema. You have no constraints or data types or referential integrity.

Also, you may already be using a relational database successfully for the rest of your data management and can't justify running two different databases simultaneously.

A compromise between relational and non-relational extremes is the Serialized LOB design, with the extension described in How FriendFeed Uses MySQL to Store Schema-Less Data. Most of your data resides in traditional relational tables. Your amorphous form data goes into a single BLOB column, in some format that encodes fields and data together (for example, XML or JSON or YAML). Then for any field of that data you want to be searchable, create an auxiliary table to index that single field and reference rows of form data where a given value in that respective field appears.

iDev247
  • 1,761
  • 3
  • 16
  • 36
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thanks for confirming my doubt about the EAV data model. Installing another RDBMS seem to be the best solution, but is not an option in my case. I'll take a look at the possibility of serializing the values in a BLOB field, or have a huge table with 1 column for each of all the possible attributes. Sounds right ? – FMaz008 Jun 29 '11 at 16:00
  • That last idea is called Single-Table Inheritance. It can work, but you need a lot of columns and you'll find yourself adding more columns as you need to support new form types. The relational solution is to create a custom table per form, containing just the columns needed for that form. – Bill Karwin Jun 29 '11 at 16:09
  • "The relational solution is to create a custom table per form" ... you mean "per field" right ? If I have a form with 30 differents fields, is it still a good idea considering that I would have to query 30+ table to render the form ? ( I'll also have to generate report out of the results... so reading speed is important.) – FMaz008 Jun 29 '11 at 17:15
  • No, I mean per form. One table with 30 columns, where each column corresponds to one form field, with an appropriate name and data type. You may also need child tables for form fields that support multiple values (e.g. "check all that apply"). And probably lookup tables if you want to constraint some form fields to a set of choices. – Bill Karwin Jun 29 '11 at 17:52
  • The table-per-field design what you get if you use the FriendFeed solution. And there you don't need a table for *every* field, just the ones you want to index for efficient searches. – Bill Karwin Jun 29 '11 at 17:55