MySQL query with a union for 100 tables vs 100 separate queries what is more performant?

Question

I have a legacy project using MySQL DB with MyISAM tables, and the DB design is far from perfect. And I encountered with N+1 problem because of having entity table and amount of entity_SOME_ID_subentity tables with similar base structure and some random additional columns, where SOME_ID is a primary key value of records from entity table.

For sure this is absolutely not great, but let's assume this is our initial condition and cannot be changed in the short term. So I need to optimize a query where I need to select some amount of records from entity table and some aggregated data from related entity_SOME_ID_subentity table. The aggregation will use only columns that are similar in all subentity tables. Initially, this was implemented as a single query to entity table and then a lot of queries in a loop to corresponding entity_SOME_ID_subentity tables.

I cannot use joins since each entity has a separate subentity table, so maybe using a union can help to reduce the number of queries down to 2, where the second one will use a union for subqueries to each required subentity table.

An additional note is that I need to do sorting of all stuff before pagination will be applied.

Can you advice is it worth at all to try the approach with the union in this situation, or performance be bad in both cases? Or maybe you have better ideas about how this can be handled?

Update:

The query to entity table is trivial and looks like:

SELECT col1, col2, col3 FROM entity WHERE ... LIMIT 10 OFFSET 0;

And the query to entity_SOME_ID_subentity looks like:

SELECT count(id) total, min(start_date) started, max(completion_date) completed 
FROM entity_1234_subentity
ORDER BY started;

Here entity_1234_subentity is an example of how table names look like.

And using unions can look like:

SELECT count(id) total, min(start_date) started, max(completion_date) completed 
    FROM entity_1111_subentity
UNION
    (SELECT count(id) total, min(start_date) started, max(completion_date) completed 
    FROM entity_2222_subentity)
UNION
    (SELECT count(id) total, min(start_date) started, max(completion_date) completed 
    FROM entity_3333_subentity)
...
ORDER BY started

Neither? It's possible you can refactor the database into something more sensible and use views for backwards compatibility. Is this some sort of table subclassing? — Schwern, Jun 26 '20 at 17:34
DB structure refactoring is not possible unfortunately, the reason for this DB design is because subentities tables structure is generated in runtime based on user provided configs. — dajnz, Jun 26 '20 at 17:41
@dajnz This is a job for [JSON](https://www.mysqltutorial.org/mysql-json/). — Schwern, Jun 26 '20 at 17:49
My apologies, I must correct myself: There's a limit of 61 table references in JOINs of a single query. I tested and I find no such limit to the number of table references UNION. — Bill Karwin, Jun 26 '20 at 19:47

Schwern · Answer 1 · 2020-06-26T18:04:01.347

It smells like someone tried to implement table inheritance and left a mess.

You can fix this using JSON and views, possibly faster than you can write out 100 unions.

In a transaction (or at least test it on a copy) modify entity so it can hold all the information in the subtables.

Add all the common columns from the subtables into entity.
Add a JSON column to hold the grab bag of data.

alter table entity add column start_date datetime;
alter table entity add column completion_date datetime;
alter table entity add column data json;

If you're not into JSON, you can use a traditional key/value table to store the extra columns, though this loses some flexibility because the value must be a string.

create table entity_data (
  entity_id bigint not null,
  key varchar(255) not null,
  value varchar(255) not null
);

Then, for each subtable...

Update each entity's info with its subentity info. The common columns are updated directly. The rest turn into JSON.

update entity e
inner join entity_123_subentity s on e.id = 123
set
  start_date = s.start_date,
  completion_date = s.completion_date,
  data = json_object(`extra1`, s.extra1, `extra2`, s.extra2)
where id = 123

Once that's done and verified, drop the subtable and replace it with a view.

drop entity_123_subentity;

create view entity_123_subentity
  (id, start_date, completion_date, extra1, extra2)
as
select
  id, start_date, completion_date, data->>'$.extra1', data->>'$.extra2'
from entity
where subid = 123;

Repeat until there are no more subtables.

New queries can be written efficiently, and old queries will still work until they can be rewritten.

Thank you, but it looks like MyISAM tables [does not support transactions](https://dev.mysql.com/doc/refman/5.7/en/myisam-storage-engine.html) unfortunately. — dajnz, Jun 26 '20 at 17:56
@dajnz [That can also be fixed](https://stackoverflow.com/questions/3856435/how-to-convert-all-tables-from-myisam-into-innodb), but since you're flying without a net anyway, do it without transactions. But test it on a copy of the database first! — Schwern, Jun 26 '20 at 17:58

score 0 · Answer 2 · answered Jun 26 '20 at 17:55

That's a typical design which seemed to be smart by the time of being created but turns out to be absolutely not scalable... Have seen a lot projects like this. If I where you, I would create an index for the search function.

You could

a) User external indexing/search engine SOLR or ElasticSearch.

b) in your RDBS create a index-table containing the recurring information from all sub-tables (like id, start_date, completion_date in your case) which gets updated either on every sub-table update, or, if there are to many places in the code you had to change, every hour/day whatever by cronjob.

MySQL query with a union for 100 tables vs 100 separate queries what is more performant?

Update:

2 Answers2