0

I am planning to develop an application using the EAV design. I have done a lot of research on EAV and the sixth normal form. I have even talked to people at work and they say avoid both approaches if you care about your sanity. I had the idea to create tables with over a thousand columns but according to this question that may not be a good idea. So what I wanted to do was instead of creating a thousand columns in one table I am going to create a thousand "one column" tables. This will enable me to separate all the columns into "one column" tables. This will give me ultimate flexibility but I am afraid performance will suffer greatly.

  • My question is: is a query drastically slow with 50 inner joins (one to one)? This database will be powering a public website.
Community
  • 1
  • 1
Luke101
  • 63,072
  • 85
  • 231
  • 359
  • 10
    Yes, I think a query with more than seven would be considered too big. One thousand one column tables? Um - no. You need to learn how to design a relational database. Get someone to help you - it sounds like you're in over your head. – duffymo Feb 20 '13 at 00:01
  • any reason why you can't create a view then pivot those Attributes into (virtual) columns ? – YS. Feb 20 '13 at 00:02
  • 1
    You need to study how to design a relational database, If you prepare 1 column 1000 tables, you are probably not using any *relation* so why you need a join then? If all the elements are related, you should be focusing on the relations carefully, that is what Relational Databases are for – Mustafa Feb 20 '13 at 00:05
  • 1
    Joining 50 tables may be efficient with proper indexing but single column tables does not sound like EAV. – ypercubeᵀᴹ Feb 20 '13 at 00:10
  • @duffymo row modeling is non-relational – guido Feb 20 '13 at 00:10
  • @YS the columns will be unknown at time of query – Luke101 Feb 20 '13 at 00:12
  • take a look at DB2 or postgresql xml datatype for possible alternative solutions. – guido Feb 20 '13 at 00:15
  • @guido - I read for sql server that group by cannot be performed on XML column. Is this true with postgresql? – Luke101 Feb 20 '13 at 00:44
  • 1
    @Luke101 you can group by a value extracted with xpath. Like this: http://www.sqlfiddle.com/#!12/af4e2/2/0 – guido Feb 20 '13 at 01:04
  • sounds like the old key-value pair db. I remember creating custom versions of these in C++ before using Berkeley db and others. Now we have NoSQL craze, and many variants exist. But to do this in a relational db? ok, have fun! – tbone Feb 20 '13 at 02:08
  • While not PostgreSQL-based, this article about EAV is worth reading: https://sqlblog.org/2009/11/19/what-is-so-bad-about-eav-anyway – dezso Feb 20 '13 at 07:28
  • As ypercube said, 1000 single-column tables (or for that matter, one table with 1000 columns) is not characteristic of EAV - as matters stand, it seems as though the solution you are considering isn't actually EAV. Can you include more information about the application domain and your considered solution, or is this information confidential? –  Feb 21 '13 at 10:42
  • @Mark - I will be using the database to create tables on the fly it will be the same as this question http://stackoverflow.com/questions/12127814/database-design-to-create-tables-on-the-fly – Luke101 Feb 21 '13 at 18:51
  • 1
    @ypercubeᵀᴹ Presumably by ' "one column" table' they mean a table with CK plus one column (6NF), presumably with the CK being {user, entity}. And that use of 6NF is the canonical (NULL-free) relational way to represent a nullable column. And it isn't meant to be EAV, it is meant to be instead of EAV. So it turns out that it is the (NULL-free) relational design alternative to *every* EAV design. (Short of nested relations & tuples allowing a table with user as CK and the state of their dbms/table as column. Which coincidentally also uses 6NF.) – philipxy Apr 16 '17 at 19:35

4 Answers4

2

This sounds like a problem better solved outside the relational model - use xml, json, array-valued or hstore fields, or use a key/value or column store that's optimised for this kind of work.

See:

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
1

I've run into the practical default limit in postgresql, and it is 8 joins. Anything over 8 joins in postgres, and you're playing with fire.

More specifically, join_collapse_limit defaults to 8. The planner refuses to reorder joins if the query excedes join_collapse_limit joins. I suspect planning time is roughly O(n^2) with respect to number of joins (hence the low default). You could increase join_collapse_limit to 1000, but then query planning would become very slow.

So, yes, in postgres, more than 50 joins is probably a bad idea. And I'd wager that you'll find similar limitations in other databases: optimizing join order will probably be roughly O(n^2) on any platform.

Frank Farmer
  • 38,246
  • 12
  • 71
  • 89
  • 5
    I disagree that 8 joins is Pg's limit, I routinely join many more tables than that, particularly when complex views are involved. It depends on the size of the tables being joined, the query complexity, whether you can plan and prepare once and save the plan, and so on. Increasing `join_collapse_limit` is vital. That doesn't mean the OP's design is a good idea though. – Craig Ringer Feb 20 '13 at 01:43
0

I don't see a problem per se. This is basically how columnar databases such as Vertica store their data. There are definitely some considerations, though.

First, the assumption is that you are not doing atomic inserts, updates, and deletes across the rows. Managing 1000 tables for such operations is going to be challenging.

Second, the key used to join should be the primary key of all the tables. I would recommend nothing larger than an integer. You need to realize that this inserts 4'ish bytes of overhead per column, so the columnar database may end up being quite a bit larger than the column'ed version. In other databases, this can be mitigated by doing compression within a column.

You also need to realize that there are other limits, such as the number of columns permitted in a view and table.

Assuming that all the tables fit into memory and they are joined on primary keys, the performance may be quite decent.

Having said that, I'm not sure that 1000 one-column tables really gets what you want. There are many websites and applications that deal with wide tables. Very few (any?) totally break them into single columns. Is there a business reason why you are doing this? In general, the database design should be driven by business needs and application needs, not by some concept of first, second, or sixth normal form.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 3
    Columnar databases make plenty of other optimizations, however, other than just storing each column separately. – Aaron Bertrand Feb 20 '13 at 00:19
  • 3
    ...so, just making a bunch of columnar-like tables in a database that is not optimized to do so is not necessarily going to be a good thing. – Aaron Bertrand Feb 20 '13 at 00:29
  • @AaronBertrand . . . I don't say it is a good thing. I just say it is not a totally crazy idea. And, yes, I'm aware of optimizations that columnar databases make. – Gordon Linoff Feb 20 '13 at 00:59
  • 1
    Your opening paragraph, to me, reads similar to "Oh, you shouldn't have any problems doing this, since it's what columnar databases do." – Aaron Bertrand Feb 20 '13 at 01:02
  • This is a great answer, except I would move your last sentence to the top - that's the important concept. – Jasmine Jun 20 '14 at 21:50
0

EAV is an anti-pattern for whole applications. It should only be used for small portions of applications. See this: http://mikesmithers.wordpress.com/2013/12/22/the-anti-pattern-eavil-database-design/

And, EAV tables will have at least two columns, never one.

To answer your question though - WE DON'T KNOW. I've seen queries which join 30 tables and were still fast. I've seen queries on one table which were really slow. Some database engines (postgres) can't even handle lots of joins, and there are hard limits imposed by some database versions (MSSQL has a 256-table limit).

There is no general way to answer this question, other than to say you should almost always design things as simply as possible, and using EAV is an unnecessary complication for many apps, which tend to naturally map to wide tables.

Jasmine
  • 4,003
  • 2
  • 29
  • 39