1

Lets say we have a table A with m rows table B with n rows and m>n. What is the max and min number of rows returned when we perform an inner join?

I know that min will be 0 since the inner join returns the common rows and there could be no possible common row between the two. But what will be the max, is it n or m-n?

Also what is the max and min rows returned in a left join in the same scenario? is it m for both?

Ruben Helsloot
  • 12,582
  • 6
  • 26
  • 49
SQL_New_bee
  • 125
  • 1
  • 1
  • 8

3 Answers3

3

It is often assumed that the joining row values are unique, but need not be the case. The Venn diagrams often used to represent joins are often interpreted with this in mind, but are generally misleading. I like to think of these in a few cases.

Case1: row values are unique for each table, and it is assumed there are common rows between the tables

This is maybe most typical. Here min row count is zero (one if assumed there is some row intersection); if all rows are expected to be contained in the larger table, then row count = min(m, n).

Case2: there are no expectations of uniqueness for (joining) row values

In the most degenerate case, assume all rows m, n have identical value. In this case, the max number of output rows (matches) is the same as a cross join: row count = m*n.

I find it's easiest to think of Inner and Left/Right/Full Outer joins as a subtractive process from the cross join (Cartesian product). The best explanation I've seen anywhere is given by Martin Smith's answer here.

anon01
  • 10,618
  • 8
  • 35
  • 58
2

That all depends, if you assume that every row matches at most one other row, then the min number of rows is 0 and the max number of rows is min(m, n). If it is possible for a row from A to match with multiple rows from B, then the max explodes to m * n, if every row in A matches every row in B.

The following returns 3 rows, since the matches are direct.

WITH a(id, name) AS (
    SELECT *
    FROM (VALUES (1, 'Ringo'),
                 (2, 'George'),
                 (3, 'Paul'),
                 (4, 'John')) as a
), b(id, food) AS (
    SELECT *
    FROM (VALUES (1, 'eggs'),
                 (2, 'ham'),
                 (3, 'spam')) as b
)
SELECT *
FROM a
INNER JOIN b ON a.id = b.id;
+--+------+--+---------+
|id|name  |id|food     |
+--+------+--+---------+
|1 |Ringo |1 |Eggs     |
|2 |George|2 |Ham      |
|3 |Paul  |3 |Spam     |
+--+------+--+---------+

But this returns many more rows.

WITH a(id, name) AS (
    SELECT *
    FROM (VALUES (1, 'Ringo'),
                 (2, 'George'),
                 (3, 'Paul'),
                 (4, 'John')) as a
), b(id, food) AS (
    SELECT *
    FROM (VALUES (1, 'Eggs'),
                 (2, 'Ham'),
                 (3, 'Spam')) as b
)
SELECT *
FROM a
INNER JOIN b ON b.food <= a.name
+--+------+--+---------+
|id|name  |id|food     |
+--+------+--+---------+
|1 |Ringo |1 |Eggs     |
|2 |George|1 |Eggs     |
|3 |Paul  |1 |Eggs     |
|4 |John  |1 |Eggs     |
|1 |Ringo |2 |Ham      |
|3 |Paul  |2 |Ham      |
|4 |John  |2 |Ham      |
+--+------+--+---------+
Ruben Helsloot
  • 12,582
  • 6
  • 26
  • 49
1

The maximum number of rows is generated when all the key values are the same. In this case, the inner join is equivalent to a cross join, and the maximum number is m * n.

The maximum number for a left or right outer join is basically the same, with just a caveat that an outer join is guaranteed to return results even when one of the tables is empty. So that maximum is expressed as greatest(m, n, m * n).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786