0

I have relational category & product tables. Categories are hierarchical. I will have queries based on category, for example

select * 
from products 
where CatId = 3

or

select * 
from products 
where CatId = 1

I have 6 level of category and 24 million row for products, I have to find fast and optimal solutions. My question is which structure is suitable.

I write some options, feel free to suggest a better alternative.

Current category table:

Id  ParentId   Name 
---------------------
1   null       CatA
2   null       CatB
3    1         CatAa
4    2         CatBa

Product table option 1

Id Cat Name
------------------
1  3   Product_1 
2  4   Product_2 

Product table option 2

Id CatLevel1 CatLevel2 ... Name 
-------------------------------------
1   1         3        .   Product_1 
2   2         4        .   Product_2 

Product table option 3

Id Cats Name 
------------------
1   1:3 Product_1 
2   2:4 Product_2 
Mesut Tasci
  • 11
  • 1
  • 3
  • Is your hierarchy always the same depth? (Do all branches of the tree always go to the same level?) – MatBailie Dec 04 '21 at 07:24
  • 1
    Your question is not clear . i don't see any query... but only some table struc.. .. – ScaisEdge Dec 04 '21 at 07:25
  • A more general question about representing hierarchies in SQL : https://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database – MatBailie Dec 04 '21 at 07:49
  • @MatBailie I have 6 level for now, some products can connect to upper levels. example: product3.CatId=1, so when I want cat 1 products, it must be listed product1 and product3 – Mesut Tasci Dec 04 '21 at 08:13
  • @ScaisEdge which is optimal structure? – Mesut Tasci Dec 04 '21 at 08:14
  • There is no Single optimal structure. There are many options, all with different costs and different benefits. Don't look for the One Structure, seek to understand as many different structures as possible, and then work out which is best for your current circumstances. Read the question and answers that I linked above. – MatBailie Dec 04 '21 at 08:23

2 Answers2

0

Always keep option one, plus some denormalised tables (options two onwards) if you so desire. By keeping option one, you have the source truth to revert to or derive the others from.

Option two is only recommended if the searcher always knows what depth/level to search at. For example, if they know they need Level2=CATAb then it works, but if they don't know CATAb is at level two, they don't know which column to look in. It also relies on knowing how many levels to represent; if you can have a hundred levels, you need a hundred columns, and it's fragile of you need to add more depths. Generally, this doesn't apply and so is generally not a good optimisation.

Option three is a straight no. Never store multiple values in a one field (one column of one row). It will make Efficient searching of that column next to impossible.

The alternative to option three is to have a "link" table. Just two columns, category_id and product_id. Then you list all ancestors of a product, just on different rows.

category_id product_id
1 1
3 1
2 2
4 2

These are all known as adjacency lists. A different model altogether is Nested Sets; if you research online you'll find lots of information. They're much harder to comprehend and implement initially, but very fast at retrieval when specifying a parent.

halfer
  • 19,824
  • 17
  • 99
  • 186
MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • I updated my question. Sometimes I need to call all sub categories and products. That makes the job difficult. Your many to many suggestion is better than mines, but if I apply it I will have a table that contains 6*24m=144 million row. Indexing and searching may be slower. – Mesut Tasci Dec 04 '21 at 08:21
  • @mesuttasci SO isn't the forum to dicuss the details of how indexing is implemented. Just try it for yourself, you'll find that an index on 144M rows makes still yields very fast index seeks, that's exactly what they're for. – MatBailie Dec 04 '21 at 08:39
  • I will try your solution. Thanks.. – Mesut Tasci Dec 04 '21 at 08:48
  • Should I do that for hierarchical category table: Id ParentId CategoryId --------------------- 1 1 3 2 1 4 3 2 5 4 2 6 I have 50k data, so in 6 level it makes 300k row for relations. is it suitable: select * from products where catId in (select id from categoryHieararch where parentId=1) – Mesut Tasci Dec 04 '21 at 09:27
  • @mesuttasci No need for the id column, the composite key of the other two columns is fine. So, yes, making the link table just about the category hierarchy is fine. You just then need to be sure that all categories are members of themselves (so that if you search for parent=1 your results include a row where child=1 along with all the other children). Then the results of searching that table can be joined with the product table. – MatBailie Dec 04 '21 at 09:31
0

Your product table option 1 is fine and need no change

product_id,

category_id,

... other attributes

Your problem is in accessing the product based on the category hierarchy - which would make a need of a hierarchical query to get all categories in the tree below your selected category.

Instead of

select * from product where category_id = 1;

you'll need to write an additional hierarchical query to get the whole hierarchy tree

with cat_tree (id) as (
  select id
  from category where id = 1
  UNION ALL
  select ca.id
  from cat_tree ct
  join category ca 
  on ct.id = ca.parent_id
)
select * from product 
where category_id in 
  (select id from cat_tree);

Which may not be practicable, but you may simplify it by denormalizing the category table

Let's assume your category data is such as

       ID  PARENT_ID
---------- ----------
         1           
         3          1
         5          3
         6          3

The query below, which may be implemented as a MATERIALIZED VIEW that is refreshed on each category change pre-calculates all direct and indirect parent and child relations.

The result is

        ID   CHILD_ID
---------- ----------
         1          1
         1          3
         1          5
         1          6
         3          3
         3          5
         3          6
         5          5
         6          6

E.g. for 1 you get itself, all its child's, their child's etc.

Using this category_denormobject your query can be simplified to

select * 
from product
where category_id in 
   (select child_id from category_denorm where id = 1);
Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53