Number of expected rows in sql when we perform an inner join

Question

Lets say we have a table A with m rows table B with n rows and m>n. What is the max and min number of rows returned when we perform an inner join?

I know that min will be 0 since the inner join returns the common rows and there could be no possible common row between the two. But what will be the max, is it n or m-n?

Also what is the max and min rows returned in a left join in the same scenario? is it m for both?

anon01 · Answer 1 · 2020-08-23T09:19:40.777

It is often assumed that the joining row values are unique, but need not be the case. The Venn diagrams often used to represent joins are often interpreted with this in mind, but are generally misleading. I like to think of these in a few cases.

Case1: row values are unique for each table, and it is assumed there are common rows between the tables

This is maybe most typical. Here min row count is zero (one if assumed there is some row intersection); if all rows are expected to be contained in the larger table, then row count = min(m, n).

Case2: there are no expectations of uniqueness for (joining) row values

In the most degenerate case, assume all rows m, n have identical value. In this case, the max number of output rows (matches) is the same as a cross join: row count = m*n.

I find it's easiest to think of Inner and Left/Right/Full Outer joins as a subtractive process from the cross join (Cartesian product). The best explanation I've seen anywhere is given by Martin Smith's answer here.

Ruben Helsloot · Answer 2 · 2020-08-23T09:03:01.050

That all depends, if you assume that every row matches at most one other row, then the min number of rows is 0 and the max number of rows is min(m, n). If it is possible for a row from A to match with multiple rows from B, then the max explodes to m * n, if every row in A matches every row in B.

The following returns 3 rows, since the matches are direct.

WITH a(id, name) AS (
    SELECT *
    FROM (VALUES (1, 'Ringo'),
                 (2, 'George'),
                 (3, 'Paul'),
                 (4, 'John')) as a
), b(id, food) AS (
    SELECT *
    FROM (VALUES (1, 'eggs'),
                 (2, 'ham'),
                 (3, 'spam')) as b
)
SELECT *
FROM a
INNER JOIN b ON a.id = b.id;

+--+------+--+---------+
|id|name  |id|food     |
+--+------+--+---------+
|1 |Ringo |1 |Eggs     |
|2 |George|2 |Ham      |
|3 |Paul  |3 |Spam     |
+--+------+--+---------+

But this returns many more rows.

WITH a(id, name) AS (
    SELECT *
    FROM (VALUES (1, 'Ringo'),
                 (2, 'George'),
                 (3, 'Paul'),
                 (4, 'John')) as a
), b(id, food) AS (
    SELECT *
    FROM (VALUES (1, 'Eggs'),
                 (2, 'Ham'),
                 (3, 'Spam')) as b
)
SELECT *
FROM a
INNER JOIN b ON b.food <= a.name

+--+------+--+---------+
|id|name  |id|food     |
+--+------+--+---------+
|1 |Ringo |1 |Eggs     |
|2 |George|1 |Eggs     |
|3 |Paul  |1 |Eggs     |
|4 |John  |1 |Eggs     |
|1 |Ringo |2 |Ham      |
|3 |Paul  |2 |Ham      |
|4 |John  |2 |Ham      |
+--+------+--+---------+

Also what is the max and min rows returned in a left join in the same scenario? is it m, m? — SQL_New_bee, Aug 23 '20 at 08:33
It's `m, m` if there is at most one match per row, `m, infinity` otherwise — Ruben Helsloot, Aug 23 '20 at 08:36
@RubenHelsloot could you expand on how the maximum could be `infinity` ? As far as I know the worst a join could do is the cartesian product meaning `m * n` records. — Gabriel Durac, Aug 23 '20 at 08:59
You're right, I forgot about that! I just wanted to say that there is no clear upper bound - except the one you just mentioned — Ruben Helsloot, Aug 23 '20 at 09:02

score 1 · Answer 3 · answered Aug 23 '20 at 10:23

The maximum number of rows is generated when all the key values are the same. In this case, the inner join is equivalent to a cross join, and the maximum number is m * n.

The maximum number for a left or right outer join is basically the same, with just a caveat that an outer join is guaranteed to return results even when one of the tables is empty. So that maximum is expressed as greatest(m, n, m * n).

Number of expected rows in sql when we perform an inner join

3 Answers3