I have a database that is constructed as "results" for "Products". The table "results" is pretty easy: an "id" column and each next column as a result. Then we have several product tables that links to the first table. This happens because the products have different properties, so some columns can be common, but other can be all different. This is easy to add data, but complex when we collect data from them. A way to replicate this is (code below) to have a table "results" and two products: "productA", "productB". They relate trough a relationship table called "ProductAResults" and "ProductBResults". There are results without product.
The riddle is:
- get the results and associate the products, but without duplicate common fields ("enter"). a union can be done.
- Search results when we search for an id, but we have to search in "id1" and "id2".
In production:
- results have 10Million rows.
- productAresults have 5Million rows
- productBresults have 4Million rows
- product A and B have 5000 rows Two strategies were attempt:
- use union and leftjoin for products and productXresults.
- use left join (no unions) for products, and then coalesce to create the columns (and select one or other if not null).
Conclusions so far:
- using unions is super slow, just to show a subset of data (they always retreive full tables)
- using temprorary tables to reduce the union's size (by using a select in with a result.id) is costy (needs
- using unions makes search a lot complicated and slower.
- it is better to use coalesce to join the tables, using left join on products. Does not work when you have 2 products on the same result.
- if you what to search for a result (1 or 2) using unions is costy.
- if you want to show the products from a range ID is costing
i have made a small example, this tends to degrade the performance after a couple million lines in results or products.
anyway, a query to show what i meant is:
- with unions
select *
from results as a
left join
(
select idresult, id1, null as id2, enter from productA as aa join productAresults as ab on aa.id=ab.idproductA
union
select idresult, null as id1, id2, enter from productB as ba join productBResults as bb on ba.id=bb.idproductB
) as u on a.id=u.idresult
order by a.dta desc;
select *
from results as a
join
(
select idresult, id1, null as id2, enter from productA as aa join productAresults as ab on aa.id=ab.idproductA
union
select idresult, null as id1, id2, enter from productB as ba join productBResults as bb on ba.id=bb.idproductB
) as u on a.id=u.idresult
where u.id1 = 'A' or u.id2='A'
order by a.dta desc;
- with coalesce, this is a lot faster, but queries are tricky:
select a.*, coalesce(ab.idresult, bb.idresult) as idresult, id1, id2, coalesce(aa.`enter`, ba.`enter`) as `enter`
from results as a
left join
productA as aa join productAresults as ab on aa.id=ab.idproductA
on a.id=ab.idresult
left join
productB as ba join productBResults as bb on ba.id=bb.idproductB
on a.id=bb.idresult
order by a.dta desc;
select a.*, coalesce(ab.idresult, bb.idresult) as idresult, id1, id2, coalesce(aa.`enter`, ba.`enter`) as `enter`
from results as a
left join
productA as aa join productAresults as ab on aa.id=ab.idproductA
on a.id=ab.idresult
left join
productB as ba join productBResults as bb on ba.id=bb.idproductB
on a.id=bb.idresult
where a.id in (
select idresult from
(
select idresult, id1 as idP, enter from productA as aa join productAresults as ab on aa.id=ab.idproductA and aa.id1='A'
union
select idresult, id2 as idP, enter from productB as ba join productBResults as bb on ba.id=bb.idproductB and ba.id2='A'
) as u)
order by a.dta desc;
Create test schema
CREATE TABLE `results` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dta` datetime NOT NULL default CURRENT_TIMESTAMP,
`result` int not null,
PRIMARY KEY (`id`),
Key(`dta`)
);
create table `productA`
(
`id` int(11) NOT NULL AUTO_INCREMENT,
`id1` varchar(10) not null,
`enter` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `id1_UNIQUE` (`id1`)
);
CREATE TABLE `productAResults` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idProductA` int(11) NOT NULL,
`idResult` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idResult` (`idResult`,`idProductA`),
KEY `idResult2` (`idResult`),
KEY `idProductA` (`idProductA`),
CONSTRAINT `productAResults_ibfk_1` FOREIGN KEY (`idResult`) REFERENCES `Results` (`id`) ON DELETE RESTRICT,
CONSTRAINT `productAResults_ibfk_2` FOREIGN KEY (`idProductA`) REFERENCES `ProductA` (`id`) ON DELETE RESTRICT
);
create table `productB`
(
`id` int(11) NOT NULL AUTO_INCREMENT,
`id2` varchar(10) not null,
`enter` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `id2_UNIQUE` (`id2`)
);
CREATE TABLE `productBResults` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idProductB` int(11) NOT NULL,
`idResult` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idResult3` (`idResult`,`idProductB`),
KEY `idResult2` (`idResult`),
KEY `idProductB` (`idProductB`),
CONSTRAINT `productBResults_ibfk_1` FOREIGN KEY (`idResult`) REFERENCES `Results` (`id`) ON DELETE RESTRICT,
CONSTRAINT `productBResults_ibfk_2` FOREIGN KEY (`idProductB`) REFERENCES `ProductB` (`id`) ON DELETE RESTRICT
);
insert into productA (id1, enter) values ('A', NOW() );
insert into productA (id1, enter) values ('B', NOW());
insert into productA (id1, enter) values ('C', NOW());
insert into productA (id1, enter) values ('D', NOW());
insert into productB (id2, enter) values ('AA', NOW());
insert into productB (id2, enter) values ('BB', NOW());
insert into productB (id2, enter) values ('CC', NOW());
insert into productB (id2, enter) values ('DD', NOW());
insert into results (result, dta) values(1, NOW());
insert into productAresults (idresult, idProductA) values (last_insert_id(), (select id from productA where id1='A'));
insert into results (result, dta) values(1, NOW());
insert into productAresults (idresult, idProductA) values (last_insert_id(), (select id from productA where id1='B'));
insert into results (result, dta) values(1, NOW());
insert into productAresults (idresult, idProductA) values (last_insert_id(), (select id from productA where id1='C'));
insert into results (result, dta) values(1, NOW());
insert into productAresults (idresult, idProductA) values (last_insert_id(), (select id from productA where id1='C'));
insert into results (result, dta) values(1, NOW());
insert into productBresults (idresult, idProductB) values (last_insert_id(), (select id from productB where id2='AA'));
insert into results (result, dta) values(1, NOW());
insert into productBresults (idresult, idProductB) values (last_insert_id(), (select id from productB where id2='BB'));
insert into results (result, dta) values(2, NOW());
insert into productBresults (idresult, idProductB) values (last_insert_id(), (select id from productB where id2='CC'));
insert into results (result, dta) values(2, NOW());
insert into productBresults (idresult, idProductB) values (last_insert_id(), (select id from productB where id2='CC'));
insert into results (result, dta) values(5, NOW());
insert into productBresults (idresult, idProductB) values ((select max(id) from results), (select id from productB where id2='DD'));
insert into productAresults (idresult, idProductA) values ((select max(id) from results), (select id from productA where id1='D'));
teste here: http://sqlfiddle.com/#!9/5c3c73/6 check the products D and DD results Query times to evaluate (indication): 1.1 - 11ms (all data query) 1.2 - 16ms(sorted data) 2.1 - 2ms (all data query) 2.2 - 2ms (sorted data)