NaturalJoin (R,S-T) equivalence Difference(NaturalJoin(R,S),NaturalJoin(R,T))
The general way to deal with this is to replace operator calls by their definitions.
Here is an outline assuming certain equivalences between relation expressions and the tuples they hold. One actually needs to use the equivalences to justify that one's queries return the tuples that one has been asked to get, but this is not usually explained. (One instead learns via a lot of examples and handwaving.)
S
& T
have the same set of attributes.
X
holds rows (...)
where X(...)
, ie (...) IN X
.
NATURALJOIN(X,Y)
holds rows where X(...) AND Y(...)
.
DIFFERENCE(X,Y)
holds rows where X(...) AND NOT Y(...)
.
The left holds rows where:
R(...) AND (S(...) AND NOT T(...))
R(...) AND S(...) AND NOT T(...)
The right holds rows where:
(R(...) AND S(...)) AND NOT (R(...) AND T(...))
(R(...) AND S(...)) AND (NOT R(...) OR NOT T(...))
((R(...) AND S(...)) AND NOT R(...)) OR ((R(...) AND S(...)) AND NOT T(...))
(R(...) AND S(...) AND NOT R(...)) OR (R(...) AND S(...) AND NOT T(...))
R(...) AND S(...) AND NOT T(...)
So they are equivalent.
You can convert this to a proof by replacing X(...)
by x IN X
and using appropriate quantifications (FORALL
& FORSOME
/EXISTS
) and set comprehensions ({
variable
|
wff
}
).
Re using natural join in reasoning & SQL see this answer and its links.
And if you know what query could be more optimal in a sense of runtime, that would be really helpful.
That depends on your DMBS and its query implementation/optimization. There is no "optimal" without an execution model, cost/benefit function and that function's input arguments. Moreover "optimal" is chaotic--a tiny change in relational & physical DDL, database contents & statistics, query DML, query & update patterns and DBMS implementation can give completely different tradeoffs.