0

I have a database that is constructed as "results" for "Products". The table "results" is pretty easy: an "id" column and each next column as a result. Then we have several product tables that links to the first table. This happens because the products have different properties, so some columns can be common, but other can be all different. This is easy to add data, but complex when we collect data from them. A way to replicate this is (code below) to have a table "results" and two products: "productA", "productB". They relate trough a relationship table called "ProductAResults" and "ProductBResults". There are results without product.

The riddle is:

  • get the results and associate the products, but without duplicate common fields ("enter"). a union can be done.
  • Search results when we search for an id, but we have to search in "id1" and "id2".

In production:

  • results have 10Million rows.
  • productAresults have 5Million rows
  • productBresults have 4Million rows
  • product A and B have 5000 rows Two strategies were attempt:
  1. use union and leftjoin for products and productXresults.
  2. use left join (no unions) for products, and then coalesce to create the columns (and select one or other if not null).

Conclusions so far:

  • using unions is super slow, just to show a subset of data (they always retreive full tables)
  • using temprorary tables to reduce the union's size (by using a select in with a result.id) is costy (needs
  • using unions makes search a lot complicated and slower.
  • it is better to use coalesce to join the tables, using left join on products. Does not work when you have 2 products on the same result.
  • if you what to search for a result (1 or 2) using unions is costy.
  • if you want to show the products from a range ID is costing

i have made a small example, this tends to degrade the performance after a couple million lines in results or products.

anyway, a query to show what i meant is:

  1. with unions
select *
from results as a 
left join 
  (
    select idresult, id1, null as id2, enter from productA as aa join productAresults as ab on aa.id=ab.idproductA 
     union 
    select idresult, null as id1, id2, enter from productB as ba join productBResults as bb on ba.id=bb.idproductB
  ) as u on a.id=u.idresult
  order by a.dta desc;
  
  select *
from results as a 
join 
  (
    select idresult, id1, null as id2, enter from productA as aa join productAresults as ab on aa.id=ab.idproductA 
     union 
    select idresult, null as id1, id2, enter from productB as ba join productBResults as bb on ba.id=bb.idproductB
  ) as u on a.id=u.idresult
  where u.id1 = 'A' or u.id2='A'
  order by a.dta desc;
  
  1. with coalesce, this is a lot faster, but queries are tricky:
  select a.*, coalesce(ab.idresult, bb.idresult) as idresult, id1, id2, coalesce(aa.`enter`, ba.`enter`) as `enter`
from results as a 
left join 
    productA as aa join productAresults as ab on aa.id=ab.idproductA 
    on a.id=ab.idresult
    left join
    productB as ba join productBResults as bb on ba.id=bb.idproductB
    on a.id=bb.idresult
    order by a.dta desc;
    
      select a.*, coalesce(ab.idresult, bb.idresult) as idresult, id1, id2, coalesce(aa.`enter`, ba.`enter`) as `enter`
from results as a 
left join 
    productA as aa join productAresults as ab on aa.id=ab.idproductA 
    on a.id=ab.idresult
    left join
    productB as ba join productBResults as bb on ba.id=bb.idproductB
    on a.id=bb.idresult
    where a.id in (
      select idresult from
      (
    select idresult, id1 as idP, enter from productA as aa join productAresults as ab on aa.id=ab.idproductA and aa.id1='A'
     union 
    select idresult, id2 as idP, enter from productB as ba join productBResults as bb on ba.id=bb.idproductB and ba.id2='A'
  ) as u)
    order by a.dta desc;

Create test schema

CREATE TABLE `results` (

  `id` int(11) NOT NULL AUTO_INCREMENT,
  `dta` datetime NOT NULL default CURRENT_TIMESTAMP,
  `result` int not null,
  PRIMARY KEY (`id`),
  Key(`dta`)
  );
 create table `productA`
 (
  `id` int(11) NOT NULL AUTO_INCREMENT,
   `id1` varchar(10)  not null,
   `enter` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (`id`),
   UNIQUE KEY `id1_UNIQUE` (`id1`)
   );
  CREATE TABLE `productAResults` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `idProductA` int(11) NOT NULL,
  `idResult` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idResult` (`idResult`,`idProductA`),
  KEY `idResult2` (`idResult`),
  KEY `idProductA` (`idProductA`),
  CONSTRAINT `productAResults_ibfk_1` FOREIGN KEY (`idResult`) REFERENCES `Results` (`id`) ON DELETE RESTRICT,
  CONSTRAINT `productAResults_ibfk_2` FOREIGN KEY (`idProductA`) REFERENCES `ProductA` (`id`) ON DELETE RESTRICT
    );
    
   create table `productB`
 (
  `id` int(11) NOT NULL AUTO_INCREMENT,
   `id2` varchar(10) not null,
   `enter` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (`id`),
   UNIQUE KEY `id2_UNIQUE` (`id2`)
   );
  CREATE TABLE `productBResults` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `idProductB` int(11) NOT NULL,
  `idResult` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idResult3` (`idResult`,`idProductB`),
  KEY `idResult2` (`idResult`),
  KEY `idProductB` (`idProductB`),
  CONSTRAINT `productBResults_ibfk_1` FOREIGN KEY (`idResult`) REFERENCES `Results` (`id`) ON DELETE RESTRICT,
  CONSTRAINT `productBResults_ibfk_2` FOREIGN KEY (`idProductB`) REFERENCES `ProductB` (`id`) ON DELETE RESTRICT
    );
    
    insert into productA (id1, enter) values ('A', NOW() );
    insert into productA (id1, enter) values ('B', NOW());
    insert into productA (id1, enter) values ('C', NOW());
    insert into productA (id1, enter) values ('D', NOW());
    
     insert into productB (id2, enter) values ('AA', NOW());
    insert into productB (id2, enter) values ('BB', NOW());
    insert into productB (id2, enter) values ('CC', NOW());
    insert into productB (id2, enter) values ('DD', NOW());
    
    
    insert into results (result, dta) values(1, NOW());
    insert into productAresults (idresult, idProductA) values (last_insert_id(), (select id from productA where id1='A'));
    insert into results (result, dta) values(1, NOW());
    insert into productAresults (idresult, idProductA) values (last_insert_id(), (select id from productA where id1='B'));
    insert into results (result, dta) values(1, NOW());
    insert into productAresults (idresult, idProductA) values (last_insert_id(), (select id from productA where id1='C'));
    insert into results (result, dta) values(1, NOW());
    insert into productAresults (idresult, idProductA) values (last_insert_id(), (select id from productA where id1='C'));
    insert into results (result, dta) values(1, NOW());
    insert into productBresults (idresult, idProductB) values (last_insert_id(), (select id from productB where id2='AA'));
    insert into results (result, dta) values(1, NOW());
    insert into productBresults (idresult, idProductB) values (last_insert_id(), (select id from productB where id2='BB'));
    insert into results (result, dta) values(2, NOW());
    insert into productBresults (idresult, idProductB) values (last_insert_id(), (select id from productB where id2='CC'));
    insert into results (result, dta) values(2, NOW());
    insert into productBresults (idresult, idProductB) values (last_insert_id(), (select id from productB where id2='CC'));
    insert into results (result, dta) values(5, NOW());
    insert into productBresults (idresult, idProductB) values ((select max(id) from results), (select id from productB where id2='DD'));
    insert into productAresults (idresult, idProductA) values ((select max(id) from results), (select id from productA where id1='D'));

teste here: http://sqlfiddle.com/#!9/5c3c73/6 check the products D and DD results Query times to evaluate (indication): 1.1 - 11ms (all data query) 1.2 - 16ms(sorted data) 2.1 - 2ms (all data query) 2.2 - 2ms (sorted data)

LMatos
  • 21
  • 4
  • Why do you have separate tables for a and b? – Strawberry Jan 27 '21 at 00:14
  • that's my main issue. they have different id composition and different elements. in the example, they only have one common field and a different one. In the production database, we have 5 products (so far) with 2 or 3 common fields in 6 or more fields. Those 6 fields characterize the product. – LMatos Jan 27 '21 at 00:29
  • what i am thinking is to create a common primary key (id) between products and have only one "ProductResult" table. – LMatos Jan 27 '21 at 00:34
  • Sounds like a step in the right direction – Strawberry Jan 27 '21 at 00:39
  • but keep the several product tables. but at least the glue table is only one. – LMatos Jan 27 '21 at 01:11
  • This sounds like a class inheritance (aka supertype - subtype) problem. You have a generic product class and the specific product A, product B, etc subclasses. However, it seems that you currently do not represent your product claa in your database, which seems to be the root cause of the problem. – Shadow Jan 27 '21 at 01:40
  • @Shadow: that seems to be the problem, but how can it be done. if there is a superclass product and several subclasses, in this case, a table that relates products and results, and several product tables, how to relate them? it should be used an index cross tables. Is this possible? – LMatos Jan 27 '21 at 10:38
  • Seems like this is discussed here: https://stackoverflow.com/questions/8509026/is-cross-table-indexing-possible . It is not straightforward. – LMatos Jan 27 '21 at 10:56
  • I'm not sure that question is exactly the same as yours. If I understood you correctly, you could create a products superclass (aka table), which would link to your products table. The products table would contain the common fields between your various products, including the id and enter fields. This table would also link to productA and productB tables, which would host the fields specific to that table only. You would probably want to implement an exclusive arch (a product can only be either in A or B, but not in both). – Shadow Jan 27 '21 at 12:06
  • This way the id and enter based searches would only need the products table and the fields can be properly indexed. When you create a new product, you first insert a record to the main products table and then get the generated product id and insert into the chosen specific product table (A or B). This is the classic class table inheritance pattern. See the following SO question for detailed discussions on the various class inheritance options: https://stackoverflow.com/questions/3579079/how-can-you-represent-inheritance-in-a-database – Shadow Jan 27 '21 at 12:11

0 Answers0