1

A little overview of the problem.

Let's say I have a table named TableA with fixed properties, PropertyA, PropertyB, PropertyC. This has been enough for your own website needs but then you suddenly have clients that want custom fields on your site.

ClientA wants to add PropertyD and PropertyE. ClientB wants to add PropertyF and PropertyG.

The catch is these clients don't want each others fields. Now imagine if you get more clients, the solution of just adding nullable fields in TableA will be cumbersome and you will end up with a mess of a table. Or at least I assume that's the case feel free to correct me. Is it better if I just do that?

Now I thought of two solutions. I'm asking if there's a better way to do it since I'm not that confident with the trade offs and their future performance.

Proposed Solution #1 Proposed Solution #1

data_id is a not exactly a foreign key but it stores whatever corresponding client property is attached to a table A row. Using client_id as the only foreign key present on both the property table and table A.

It feels like it's an anti pattern of some sorts but I could imagine queries will be easy this way but it requires that the developer knows what property table it should pick from. I'm not sure if many tables is a bad thing.

Proposed Solution #2 Proposed Solution #2

I believe it's a bit more elegant and can easily add more fields as necessary. Not to mention these are the only tables I would need for everything else. Just to visualize. I will add the request properties in the properties table like so:

Properties
-------------
1 | PropertyD
2 | PropertyE
3 | PropertyF
4 | PropertyG

And whenever I save any data I would tag all properties whenever they are available like so. For this example I want to save a ClientA stored in the Clients table on id 1.

Property_Mapping
--------------------------------------------------------
property_id | table_a_id | property_value   | client_id
--------------------------------------------------------
1           | 1          | PROPERTY_D_VALUE | 1
2           | 1          | PROPERTY_E_VALUE | 1

There are obvious possible complexity of query on this one, I'd imagine but it's more a tradeoff. I intended client_id to be placed on property_mapping just in case clients want the same fields. Any advice?

Franrey Saycon
  • 647
  • 1
  • 5
  • 18

1 Answers1

1

You've discovered the Entity-Attribute-Value antipattern. It's a terrible idea for a relational database. It makes your queries far more complex, and it takes 4-10x the storage space.

I covered some pros and cons of several alternatives in an old answer on Stack Overflow:

And in a presentation:

As an example of the trouble EAV causes, consider how you would respond if one of your clients says that PropertyD must be mandatory (i.e. the equivalent of NOT NULL) and PropertyE must be UNIQUE. Meanwhile, the other client says that PropertyG should be restricted to a finite set of values, so you should either use an ENUM data type, or use a foreign key to a table of allowed values.

But you can't implement any of these constraints using your Properties table, because all the values of all the properties are stored in the same column.

You lose features of relational databases when you use this antipattern, such as data types and constraints.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • This is good to know! Thank you! would love to know if you have any book to point me to? to avoid these anti pattern thinking in the future. – Franrey Saycon Feb 07 '19 at 04:17
  • And yet sometimes it's the only show in town. WordPress does ok, and I think if you split data into different tables according to data type, then it's workable – Strawberry Feb 07 '19 at 08:42
  • 2
    @FranreySaycon, Funny you should ask! I wrote a book called [SQL Antipatterns: Avoiding the Pitfalls of Database Programming](https://pragprog.com/book/bksqla/sql-antipatterns). – Bill Karwin Feb 07 '19 at 13:48
  • @Strawberry, Yes, EAV can be made to work, but it makes the job a lot more complex. If it's the only option, it's a sign that the problem you're trying to solve with EAV is itself a hard problem, and practically no other solution will be any better. – Bill Karwin Feb 07 '19 at 13:50
  • @Strawberry, If you manage a different table per data type (or even a different column per data type), you still can't implement constraints, and now you have to have another piece to your meta-model to track which property belongs to which data type. *More* complexity, not less. – Bill Karwin Feb 07 '19 at 13:51