How to predict resource requirements vs performance of many to many queries

Question

In a traditional RDBMS, Many-to-Many Joins are much more resource consuming than Many-to-One joins.

I am observing Many-to-Many queries getting slow past about 10 to 15 millions of lines in tables, using mainstream computers with 3-4 gigabytes of RAM. Querying Many-to-One relationships, I observed however no slow-down even with 50 Millions of lines.

How do one predict memory and CPU requirements against expected performance? (Is any benchmark available?)
Past which thresholds is it worth using other solutions instead? (MPP or NoSQL)

If you want to understand how queries are executed, then learn to use `explai`. — Gordon Linoff, Jul 29 '17 at 10:50
thanks @Gordon-Linoff. Does the explain command realy allow to predict memory and CPU requirements and provides benchmark regarding many to many realtionships ? — user1767316, Jul 29 '17 at 10:59
I am using postgresql. I'll have a look if the explain command can provide effective hints on optimal memory and cpu requirement, also any feedback on the subject from the community is welcomed — user1767316, Jul 29 '17 at 13:39
@Gordon-Linoff, to my knowledge and rough looking back to postgres documentation, CPU effort estimates are provided even if converted into disk-page units (https://www.postgresql.org/docs/8.0/static/performance-tips.html), but I see no memory requirements nor speed forcast given memory and cpu parameters. — user1767316, Jul 29 '17 at 15:38
In fact I am looking for - a theorical explanation of why Many2Many join is so much resouce consuming and for - numbers, or any other systematic way, allowing to predict the threshold beyond which Many2Many or even Many2One join will no longer be possible on a single node given speed and availability requirements. — user1767316, Jul 29 '17 at 16:05
Read about (logical & physical) relational query optimization/implementation, in general & any particular DBMS. Your notion that "join is so much resource consuming" is ill-founded. This is like asking why recursion or looping "is so much resource consuming"--*it does a lot of stuff that needs doing*. "Much" *compared to what*? Or do you just want *estimates*? (Anyhow, performance is somewhat a chaotic function of a zillion factors.) PS Please edit clarifications into your question, comments are ephemeral. — philipxy, Jul 31 '17 at 09:48

philipxy · Answer 1 · 2017-07-31T11:31:52.163

In a traditional RDBMS, why are Many-to-Many Joins so much more resource consuming than Many-to-One joins?

An SQL FOREIGN KEY (FK) constraint holds when subrow values for a list of columns also appear elsewhere as a UNIQUE NOT NULL (superkey). So for every row in the referencing table there can be only one matching row in the referenced table. So the result of a JOIN ON equality of a FK & its superkey can output at most one row per row in the FK table. Whereas in general since a JOIN returns every possible combination of rows that can be made from input rows that satisfies an ON condition, in general there can be many more rows output.

After all, aren't Many-to-Many relationships just like two Many-to-One relationships?

It's not clear what you mean by "just like" or how you think it suggests or justifies anything. And a join is not a relationship. (A table represents a relationship.)

How does one predict memory and CPU requirements against expected performance? (Is any benchmark available?)

Many SQL DBMSs have a query planner/optimizer EXPLAIN command & others for inquiring about what a query will or did do or cost.

Read about (logical & physical) relational query optimization/implementation, in general & in any particular DBMS. Wikipedia happens to have a decent article. Many textbooks are online.

Past which thresholds is it worth using other solutions instead? (MPP or NoSQL)

Re NoSQL search my answers, of which the most recent are:
How to convert an existing relational database to a key-value store?
How does noSQL perform horizontal scaling and how it is more efficient than RDBMS scaling
Reasonable Export of Relational to Non-Relational Data

RDBMSs offer generic straightforward querying with certain computational complexitiy & optimization opportunities. Relatively speaking, other systems specialize, with certain aspects improved at the expense of others.

How to predict resource requirements vs performance of many to many queries

1 Answers1