Wow, this is the first time I've seen someone specifically ask for a Cartesian product result in a SQL Q&A forum!
I mean, a Cartesian product result is often mentioned in SQL Q&A forums when people answering want to talk about SQL's CROSS PRODUCT
, or simply product
in relational algebra (the operator name coined by Ted Codd).
I'm not from a mathematical background and I hadn't heard of Cartesian product until I started wanting to write better SQL and the term kept popping up in answers (anyone remember Usenet newsgroups?). I'd heard that SQL was based on mathematical set theory, again from people answering to say, "You should look for a set-based approach..." So at first I didn't think much about it beyond, "Cool, I learned a new term today."
Later -- perhaps a little too late -- I started to investigate the relational model (RM) on which SQL is based and found myself questioning my own understanding of Cartesian product.
Here's a line from Wikipedia on the subject:
for sets A
and B
, the Cartesian product A x B
is the set of all ordered pairs (a, b)
where a
[is an element of] A
and b
[is an element of] B
.
Erm, "ordered pairs"? I knew enough about RM to know that ordering is not appropriate. Briefly: in RM, an operation between two relations results in a relation, a relation has a heading, which is a set of attributes, and a set by definition has no ordering; while a relation can comprise an ordered pair attribute in the first place, it is not possible for an ordered pair to be materialized as the result of a relational operation.
SQL, on the other hand, has much left-to-right ordering dependecies (e.g. column ordering in UNION
that was later fixed with UNION CORRESPONDING
), so perhaps Cartesian product has some meaning in SQL? SQL is not a strict as RM but let's say an operation between two table expressions results in a table expression. When the operation is CROSS JOIN
, is it true to say the result the set of all ordered pairs?
First, is the result of CROSS JOIN
a set? Well, if the table expressions involved have duplicate rows then the result would also have duplicate rows, so it wouldn't strictly speaking be a set. However, if we have taken steps to ensure our table expressions conform to first normal form (and indeed we should) therefore the result of CROSS JOIN
can be a set. [We have a similar problem and resolution with attribute names common to both table expressions.]
Second, is the result of CROSS JOIN
a pair? Consider an example:
WITH Suppliers AS
( SELECT * FROM (
VALUES
( 'S1', 'Smith', 'London' ),
( 'S2', 'Jones', 'Paris' )
) AS t ( SID, SNAME, CITY )
),
Parts AS
( SELECT * FROM (
VALUES
( 'S1', 'Nut', 'Red' ),
( 'S2', 'Bolt', 'Green' )
) AS t ( PID, SNAME, COLOR )
)
SELECT *
FROM Suppliers
CROSS JOIN Parts;
The result is four rows of six columns (no duplicate column names). The columns are not grouped in any way. For me, there is nothing in the result to suggest I have a set of pairs.
Third, are the columns result of CROSS JOIN
ordered? We can switch the tables...
SELECT *
FROM Parts
CROSS JOIN Suppliers;
...and, more than likely, the columns will appear in left-to-right ordering of Parts
then Suppliers
. Personally, I don't consider this to be 'ordered'. The SQL standard says words to the effect of "implementation defined" for SELECT *
, to mean there is no guarantee of any inherent ordering. I don't think there are any knowledgeable people on SO who would recommend relying on any left-to-right column ordering in a result that isn't explicitly defined.
My conclusion is SQL lacks a true Cartesian product operator and that CROSS JOIN
is just another SQL operation that results in a table expression (or similar). In the SQL context, we should stop using the term Cartesian product and instead use CROSS JOIN
or simply product
.