How do you model custom attributes of entities?

Question

Let's say we're having an application which should be able to store all kind of products. Each product has at least an ID and a Name but all other attributes can be defined by the user himself.

E.g. He could create a productgroup Ipods which would contain attributes capacity and generation
E.g. He could create a productgroup TShirts with the attributes size and color
We need to store the definition of a product and the concrete products itself.
We want to ensure that it is easily possible to aggregate (GROUP BY) by product attributes. E.g. select the total sum of capacity for each generation of ipods
The solution must not require schema changes (added requirement due to input from Bill Karwin - see his answer as well!)

How would you model your schema in respect to the above requirements?

Note: Requirment 4. is important!

Thanks everyone for contributing and discussing the approach. I have seen some solutions to this problem in the past but none of them made grouping easy for me :(

score 11 · Answer 1 · edited Jun 20 '20 at 09:12

I'd recommend either the Concrete Table Inheritance or the Class Table Inheritance designs. Both designs satisfy all four of your criteria.

In Concrete Table Inheritance:

Ipods are stored in table product_ipods with columns ID, Name, Capacity, Generation.
Tshirts are stored in table product_tshirts with columns ID, Name, Size, Color.
The definition of the concrete product types are in the metadata (table definitions) of product_ipods and product_tshirts.
SELECT SUM(Capacity) FROM product_ipods GROUP BY Generation;

In Class Table Inheritance:

Generic product attributes are stored in table Products with columns ID, Name.

Ipods are stored in table product_ipods with columns product_id (foreign key to Products.ID), Capacity, Generation.
Tshirts are stored in table product_tshirts with columns product_id (foreign key to Products.ID), Size, Color.
The definition of the concrete product types are in the metadata (table definitions) of products, product_ipods, and product_tshirts.
SELECT SUM(Capacity) FROM product_ipods GROUP BY Generation;

See also my answer to "Product table, many kinds of product, each product has many parameters" where I describe several solutions for the type of problem you're describing. I also go into detail on exactly why EAV is a broken design.

Re comment from @dcolumbus:

With CTI, would each row of the product_ipods be a variation with it's own price?

I'd expect the price column to appear in the products table, if every type of product has a price. With CTI, the product type tables typically just have columns for attributes that pertain only to that type of product. Any attributes common to all product types get columns in the parent table.

Also, when storing order line items, would you then store the row from product_ipods as the line item?

In a line-items table, store the product id, which should be the same value in both the products table and the product_ipods table.

Re comments from @dcolumbus:

That seems so redundant to me ... in that scenario, I don't see the point of the sub-table. But even if the sub-table does make sense, what's the connecting id?

The point of the sub-table is to store columns that are not needed by all other product types.

The connecting id may be an auto-increment number. The sub-type table doesn't need to auto-increment its own id, because it can just use the value generated by the super-table.

CREATE TABLE products (
  product_id INT AUTO_INCREMENT PRIMARY KEY,
  sku VARCHAR(30) NOT NULL,
  name VARCHAR(100) NOT NULL,
  price NUMERIC(9,2) NOT NULL
);

CREATE TABLE product_ipods (
  product_id INT PRIMARY KEY,
  size TINYINT DEFAULT 16,
  color VARCHAR(10) DEFAULT 'silver',
  FOREIGN KEY (product_id) REFERENCES products(product_id)
);

INSERT INTO products (sku, name, price) VALUES ('IPODS1C1', 'iPod Touch', 229.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 16, 'silver');
INSERT INTO products (sku, name, price) VALUES ('IPODS1C2', 'iPod Touch', 229.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 16, 'black');
INSERT INTO products (sku, name, price) VALUES ('IPODS1C3', 'iPod Touch', 229.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 16, 'red');
INSERT INTO products (sku, name, price) VALUES ('IPODS2C1', 'iPod Touch', 299.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 32, 'silver');
INSERT INTO products (sku, name, price) VALUES ('IPODS2C2', 'iPod Touch', 299.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 32, 'silver');
INSERT INTO products (sku, name, price) VALUES ('IPODS2C3', 'iPod Touch', 299.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 32, 'red');

This can only satisfy 1 and 2, when users can themselves make schema changes. While I agree that EAV is broken, when the overriding criteria is that the system accomodate this behavior without schema changes, upfront instantiation of a complete array of inheritance tables or EAV or BLOB are THE only viable solutions. — Cade Roux, May 19 '09 at 19:31
Who said the database design has to accommodate new product types without changes to the schema? — Bill Karwin, May 19 '09 at 21:10
Bill thanks a lot for your brilliant answer! I do really appreciate it and love your answer about the problems about EAV as well. I forgot to mention that in my case it is impossible to change the schema and therefore EAV is a must for me. As I love STO and think it is successful because of its quality of content I have to accept the EAV specifc answer and refer to yours as well. I do really appreciate your time! — Michal, May 20 '09 at 02:54
@BillKarwin I ran across this question and your answer, and I'd like to ask you a couple of questions. With CTI, would each row of the `product_ipods` be a variation with it's own price? Also, when storing order line items, would you then store the row from `product_ipods` as the line item? — dcolumbus, Jun 11 '13 at 21:28
@BillKarwin Thanks for the edit ... but when you say "Any attributes common to all product types get columns in the parent table.", what if I have several attributes for each product? For example, `ProductA` can have "size" and "membership", and depending on what is chosen (large, 3 years), the price changes. How does that fit in this scenario? — dcolumbus, Jun 11 '13 at 21:45
@dcolumbus, in that scenario, each distinct choice should get its own row in *both* the `products` table and the sub-table. It's common for each variation of a product to get a distinct SKU number for this reason. — Bill Karwin, Jun 11 '13 at 22:00
@BillKarwin That seems so redundant to me ... in that scenario, I don't see the point of the sub-table. But even if the sub-table does make sense, what's the connecting `id`? — dcolumbus, Jun 11 '13 at 22:04
@BillKarwin I see what you've done here... however, I need to allow the user to select what "color" and what "size" they want, not sift through dozens of single variations. If I `SELECT` and `JOIN` the two tables, I get each product back (by `product_id`), but when there are several "products" that really represent the one product, how on earth can I make that work? — dcolumbus, Jun 13 '13 at 03:13
@dcolumbus, allowing the user to select options is a user interface issue. How do you hope to record the order for them once they've chosen options, if you don't reference the product, with specific choices for options, by SKU? — Bill Karwin, Jun 13 '13 at 06:41
@BillKarwin Well, I suppose the first thing that I need to be able to to do is query the `Class Table Inheritance` structure in order to get all of the possible options. That's what I really can't figure out. I really appreciate all of your help with this! — dcolumbus, Jun 13 '13 at 16:10

score 5 · Accepted Answer · answered May 19 '09 at 17:50

The grouping is not going to be easy because what aggregate operator are you going to use on "color"? Note that it is not possible to use your requirement 4 on case 2.

In any case, the aggregating is only difficult because of the variation in data types and can be mitigated by approaching it in a more typesafe way - knowing that it never makes sense to add apples and oranges.

This is the classic EAV model and it has a place in databases where carefully designed. In order to make it a bit more typesafe, I've seen cases where the values are stored in type-safe tables instead of in a single free form varchar column.

Instead of Values:

EntityID int
,AttributeID int
,Value varchar(255)

You have multiple tables:

EntityID int
,AttributeID int
,ValueMoney money

EntityID int
,AttributeID int
,ValueInt int

etc.

Then to get your iPod capacity per generation:

SELECT vG.ValueVarChar AS Generation, SUM(vC.ValueDecimal) AS TotalCapacity
FROM Products AS p
INNER JOIN Attributes AS aG
    ON aG.AttributeName = 'generation'
INNER JOIN ValueVarChar AS vG
    ON vG.EntityID = p.ProductID
    AND vG.AttributeID = aG.AttributeID
INNER JOIN Attributes AS aC
    ON aC.AttributeName = 'capacity'
INNER JOIN ValueDecimal AS vC
    ON vC.EntityID = p.ProductID
    AND vC.AttributeID = aC.AttributeID
GROUP BY vG.ValueVarChar

Not sure if your query qualifies as "none of them made grouping easy for me" or not. — Colin Burnett, May 19 '09 at 18:04
+1 Only answer that is as flexible as the question demands! You could simplify to EA, where you store an attribute's value in the attribute table. — Andomar, May 19 '09 at 22:12
Thanks a lot for your effort. I was not aware of the term EAV and therefore this is the answer for my problem. However, to all who find this as an answer, please be sure to have a look at the problems that you will face when implementing EAV: http://stackoverflow.com/questions/695752/product-table-many-kinds-of-product-each-product-has-many-parameters/695860#695860 (answer from Bill Karwin) — Michal, May 20 '09 at 02:59
I would like to note here that I would typically fight against almost all EAV designs in practice until an overwhelming need to use it arises. — Cade Roux, May 20 '09 at 03:27

score 0 · Answer 3 · answered Nov 16 '10 at 07:43

I wonder how to overcome problems with using BLOB pattern as an alternative of EAV. Let’s assume that we could store all custom fields of the entity in one field as a string forexample in JSON something like tihis: {customField1: value1, customField2: value2, …, customFieldN: valueN}

How to overcome the following problems: 1. How to seach by seperate custom fields, for example, to find entities with the conditions custField1 = value1 AND customField2 = value2? 2. How to mantain data integrity, for example, if we delete a custom field for the entity how to delete all values oif these custom fields in the entity.

score 0 · Answer 4 · answered May 19 '09 at 17:37

CREATE TABLE for new products and ALTER TABLE by adding/removing columns as the user performs the operations. Use the schema to know which properties each product has. This satisfies all four of your requirements.

You'd also need a table to store the other tables names or prefix the tables with something that you can query against sysobjects for the tables:

select [name] from sysobjects where [name] like 'product_%' AND xtype='U'

score -1 · Answer 5 · answered May 19 '09 at 17:42

-1

Sounds like you are looking to design a product catalog database.

I recommend this approach. http://edocs.bea.com/wlp/docs40/catalog/schemcat.htm

answered May 19 '09 at 17:42

Jeff

5,913
2
28
30

Assuming you don't care about the wasted space of having all the other *_VALUE type columns in CATALOG_PROPERTY_VALUE. – Colin Burnett May 19 '09 at 17:48
2

url 404 page not found – Tyteck Feb 07 '20 at 09:33

How do you model custom attributes of entities?

5 Answers5

Linked