0

I am designing a database for entities like stores, products, customers, suppliers and warehouses. The database will be used as an online transactional system in different physical stores.

In my case all the attributes of stores, customers, suppliers and warehouses can be defined as columns because they won't change much over time. However the products have an unlimited amount of attributes so I would like to put this data in an EAV model.

Can anybody point out if the values should be put in their own datatype-specific table (such as attribute_values_int, or as columns in the generic attribute_value table? Because of performance reasons Magento has chosen datatype-specific value tables. See: http://blog.magestore.com/2012/03/23/magento-certificate-eav-model-concepts

Thanks.

Hans
  • 528
  • 1
  • 8
  • 29
  • Correct me if I'm wrong, but if you post a link that explains that for *performance* reasons magento used type-specific tables.. isn't the answer obvious? If you don't use type-specific tables, you end up adding everything to a varchar/text column. If you do, you need a metadata table explaining where the attribute is stored (in int, float, char table). – N.B. Jul 15 '13 at 08:18
  • @N.B. Thanks for your comment, but I can also create a values table with different columns (value_int, value_datetime etc). I hope that you can explain why the Datatype-Specific option leads to better performance, at least in the case of Magento? – Hans Jul 15 '13 at 08:23
  • Because you'll save much, much more space if you use datatype-specific table and you can index certain data-type tables where it makes sense. It's not such a performance gain as it's easing up the need to worry whether an attribute will fit into the value column. Imagine having an EAV designed with 1 values table and you're storing data which can be a single 1 digit number or a huge chunk of text. You are forced to use the `text` type so you can store anything. OR you can create a type-specific table and don't worry about the "catch-all" table and it's column for storing values. – N.B. Jul 15 '13 at 08:35
  • But what about a single Value table with multiple columns, one for each datatype? I really don't see the advantage of adding JOINS to the queries (apart from a few NULL's per row) when you can still index, sort and filter because the datatype is defined in the column? – Hans Jul 15 '13 at 09:09
  • If you use 1 table, then you have issues when you want to partition it. The best of all worlds is having a specific datatype table. You won't really gain anything having 1 values table, down to the core the write/read will be the same thing even performance-wise, but if you have multiple values table - you can at least implement various means of scaling/maintaining the db. – N.B. Jul 15 '13 at 09:15
  • @N.B. From my experience - many value tables in EAV model -> huge pain when it comes to writing queries. I prefer to have one `text` field or one table with a `text` field, `date` field (because storing dates as text isnt the best idea), reference field (if you need a reference to other entities. you can create a foreign key on this field) – Ihor Romanchenko Jul 15 '13 at 10:07
  • 1
    @Hans Only add additional value tables to EAV model if it is REALLY necessary. Just imagine a query `get value of attribute 'xyz' on entity 'abc'` when you have 3 - 5 separate tables for values. – Ihor Romanchenko Jul 15 '13 at 10:11
  • @IgorRomanchenko - EAV is generally not the easiest thing to work with when it comes to querying data as if it were normalized. Also from experience, NOT having multiple tables for data types proved to be a bad decision. If you have a website/project that generally makes sufficient use of a single table with 2 columns (1 for text, 1 for date) then great, I envy you and it's a sufficient solution :) however if you are designing a system from scratch for any purpose then (again from experience) having a table per data-type is useful. Queries will be painful to write regardless. – N.B. Jul 15 '13 at 10:14

1 Answers1

2

Frankly, the best option is "not EAV". Look into using hstore fields, XML, or json.

In PostgreSQL there is no performance advantage to using per-datatype tables. NULL values are stored in a compact NULL bitmap, so it makes very little difference whether you have a tuple like (NULL, NULL, NULL, 42, NULL, NULL) or just (42).

This also allows you to add CHECK constraint enforcing that exactly one field must be non-NULL, so you don't get multiple values of different types.

Demo:

regress=> CREATE TABLE eav_ugh (
    entity_id integer,
    int_value integer,
    numeric_value numeric,
    text_value text,
    timestamp_value timestamp with time zone,
    CONSTRAINT only_one_non_null CHECK (
            (int_value IS NOT NULL AND numeric_value IS NULL AND text_value IS NULL AND timestamp_value IS NULL) OR
            (int_value IS NULL AND numeric_value IS NOT NULL AND text_value IS NULL AND timestamp_value IS NULL) OR
            (int_value IS NULL AND numeric_value IS NULL AND text_value IS NOT NULL AND timestamp_value IS NULL) OR
            (int_value IS NULL AND numeric_value IS NULL AND text_value IS NULL AND timestamp_value IS NOT NULL)
    )
);
CREATE TABLE
regress=> insert into eav_ugh (entity_id, numeric_value) select x, x from generate_series(1,5000) x;
INSERT 0 5000
regress=> select pg_relation_size('eav_ugh');                                           
 pg_relation_size 
------------------
           229376
(1 row)

regress=> CREATE TABLE no_null_cols(entity_id integer, numeric_value numeric);
CREATE TABLE
regress=> insert into no_null_cols (entity_id, numeric_value) select x, x from generate_series(1,5000) x;
INSERT 0 5000
regress=> select pg_relation_size('no_null_cols');
 pg_relation_size 
------------------
           229376
(1 row)

regress=> SELECT sum(pg_column_size(eav_ugh)) FROM eav_ugh;
  sum   
--------
 164997
(1 row)

regress=> SELECT sum(pg_column_size(no_null_cols)) FROM no_null_cols;
  sum   
--------
 164997
(1 row)

In this case the null bitmap isn't adding any space at all, likely due to alignment requirements.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Thanks pointing that out Craig. When placing all value columns in a single table, do you know if it possible to do a select without checking datatype first? So that the db returns the non-null column value?If possible I can imagine this being a lot faster than checking datatype first. Of course we would be having problems selecting rows with date calculations. – Hans Jul 16 '13 at 10:25
  • @Hans: No, it is not; that's one of the (many) reasons I loathe EAV and prefer to use things like hstore, json or xml fields instead. – Craig Ringer Jul 16 '13 at 10:26
  • Allright, I'll have to look into the possibilities of hstore, xml and json because my attributes are linked to a configurations table. I don't know if that is possible this way. – Hans Jul 17 '13 at 09:03
  • @hans If you mean "have a foreign key into" by "linked" then no, at present you can't have a foreign key reference from hstore/json/xml keys to a lookup table. – Craig Ringer Jul 17 '13 at 09:36
  • Indeed, for performance and consistency reasons there should be a foreign key between the attributes and configuration tables. – Hans Jul 17 '13 at 09:40