I'm having trouble understanding relational algebra when it comes to theta joins, equijoins and natural joins. Could someone please help me better understand it? If I use the = sign on a theta join is it exactly the same as just using a natural join?
-
re the quotation in question from the bounty...he's not quoting Codd there, he's quoting from my answer that his comment appears under. – heisenberg Mar 05 '13 at 20:23
-
Theta-join is `relation JOIN {attribute operator attribute} relation`. Often people say "theta-join" but they actually mean [a generalization of theta-join](https://stackoverflow.com/q/50897216/3404097) `relation JOIN {predicate} relation`. (An analogue of SQL INNER JOIN ON.) – philipxy Mar 04 '21 at 07:08
7 Answers
A theta join allows for arbitrary comparison relationships (such as ≥).
An equijoin is a theta join using the equality operator.
A natural join is an equijoin on attributes that have the same name in each relationship.
Additionally, a natural join removes the duplicate columns involved in the equality comparison so only 1 of each compared column remains; in rough relational algebraic terms:
⋈ = πR,S-as ○ ⋈aR=aS
-
14the natural join will remove the columns with the same name – Bogdan Gavril MSFT Mar 05 '13 at 14:40
-
2
-
1Equijoin will also remove the equality column if they have same name in both tables. – Vishal R Nov 13 '14 at 03:42
-
2
-
@Pacerier: that's a new question rather than a request for clarification, thus not suitable for a comment. If you still want to know, try checking the appropriate SE site and, if you can't find an existing question, ask it (though make sure you phrase it unambiguously, which it currently isn't). – outis Aug 07 '15 at 00:34
-
3@Pacerier : Historically, the `theta` in theta join refers to an arbitrary condition used as the criterion for the join. (see Database Systems: The Complete Book by Garcia-Molina, Ullman, Widom, chapter 2, Theta Join) – Ram Rajamony Feb 04 '17 at 21:14
-
-
1@SebastianPalma I just rolled back your replacement of text by an image. Please [use text, not images/links, for text--including tables & ERDs.](https://meta.stackoverflow.com/q/285551/3404097) Use images only for what cannot be expressed as text or to augment text. Images cannot be searched for or cut & pasted. Include a legend/key & explanation with an image. – philipxy Oct 06 '19 at 04:37
-
A natural join is not an equijoin. So please don't say so--regardless of what you say later. It is the result of first doing an equijoin then removing one of column of each same-named column pair compared for equality. Also, please explain the idiosyncratic notation for subscripts & attributes in that equation. – philipxy Oct 06 '19 at 08:06
-
Given these definitions of the theta and natural join, I was wondering whether the natural join can be expressed in a theta join? What would it look like in relational algebraic terms? I am interested in how (if possible) to express a natural join as a theta join in algebraic terms. – Zeruno Feb 22 '20 at 18:09
-
I was a little confused by `so only 1 of each compared column remains`. So for future googlers, it means `in the final result` only one of the equal columns from each relation kept, which makes sense as why would you keep the same column twice in the final result! – piepi Jan 16 '21 at 14:17
While the answers explaining the exact differences are fine, I want to show how the relational algebra is transformed to SQL and what the actual value of the 3 concepts is.
The key concept in your question is the idea of a join. To understand a join you need to understand a Cartesian Product (the example is based on SQL where the equivalent is called a cross join as onedaywhen points out);
This isn't very useful in practice. Consider this example.
Product(PName, Price)
====================
Laptop, 1500
Car, 20000
Airplane, 3000000
Component(PName, CName, Cost)
=============================
Laptop, CPU, 500
Laptop, hdd, 300
Laptop, case, 700
Car, wheels, 1000
The Cartesian product Product x Component will be - bellow or sql fiddle. You can see there are 12 rows = 3 x 4. Obviously, rows like "Laptop" with "wheels" have no meaning, this is why in practice the Cartesian product is rarely used.
| PNAME | PRICE | CNAME | COST |
--------------------------------------
| Laptop | 1500 | CPU | 500 |
| Laptop | 1500 | hdd | 300 |
| Laptop | 1500 | case | 700 |
| Laptop | 1500 | wheels | 1000 |
| Car | 20000 | CPU | 500 |
| Car | 20000 | hdd | 300 |
| Car | 20000 | case | 700 |
| Car | 20000 | wheels | 1000 |
| Airplane | 3000000 | CPU | 500 |
| Airplane | 3000000 | hdd | 300 |
| Airplane | 3000000 | case | 700 |
| Airplane | 3000000 | wheels | 1000 |
JOINs are here to add more value to these products. What we really want is to "join" the product with its associated components, because each component belongs to a product. The way to do this is with a join:
Product JOIN Component ON Pname
The associated SQL query would be like this (you can play with all the examples here)
SELECT *
FROM Product
JOIN Component
ON Product.Pname = Component.Pname
and the result:
| PNAME | PRICE | CNAME | COST |
----------------------------------
| Laptop | 1500 | CPU | 500 |
| Laptop | 1500 | hdd | 300 |
| Laptop | 1500 | case | 700 |
| Car | 20000 | wheels | 1000 |
Notice that the result has only 4 rows, because the Laptop has 3 components, the Car has 1 and the Airplane none. This is much more useful.
Getting back to your questions, all the joins you ask about are variations of the JOIN I just showed:
Natural Join = the join (the ON clause) is made on all columns with the same name; it removes duplicate columns from the result, as opposed to all other joins; most DBMS (database systems created by various vendors such as Microsoft's SQL Server, Oracle's MySQL etc. ) don't even bother supporting this, it is just bad practice (or purposely chose not to implement it). Imagine that a developer comes and changes the name of the second column in Product from Price to Cost. Then all the natural joins would be done on PName AND on Cost, resulting in 0 rows since no numbers match.
Theta Join = this is the general join everybody uses because it allows you to specify the condition (the ON clause in SQL). You can join on pretty much any condition you like, for example on Products that have the first 2 letters similar, or that have a different price. In practice, this is rarely the case - in 95% of the cases you will join on an equality condition, which leads us to:
Equi Join = the most common one used in practice. The example above is an equi join. Databases are optimized for this type of joins! The oposite of an equi join is a non-equi join, i.e. when you join on a condition other than "=". Databases are not optimized for this! Both of them are subsets of the general theta join. The natural join is also a theta join but the condition (the theta) is implicit.
Source of information: university + certified SQL Server developer + recently completed the MOO "Introduction to databases" from Stanford so I dare say I have relational algebra fresh in mind.

- 20,615
- 10
- 53
- 74
-
1You use the term 'Cartesian product' somewhat loosely. The relational operator product results in a relation (in common with all relational operators!) A `CROSS JOIN` operation in SQL results in a table expression (rows of columns). The set operation Cartesian product results in a set of pairs. – onedaywhen Mar 06 '13 at 16:09
-
"Natural Join...is just bad practice" -- your opinion, which I don't agree. From a 'relational algebra' point of view, NATURAL JOIN makes a lot of sense (you start by saying that RM and concepts are you reasons for posting...?) Natural join is more general than product and therefore more useful. Consider that the truly relational language Tutorial D has but one join type being natural join. If my SQL product of choice had natural join I would use it all the time. Perhaps changing existing column names on production code is the bad practise here? – onedaywhen Mar 06 '13 at 16:15
-
3When you say "Databases" you actually mean "DBMSs", a crucial difference when addressing the 'concepts'. – onedaywhen Mar 06 '13 at 16:17
-
3onedaywhen - thank you for all the useful comments! feels like a code review :). I fixed the cartesian product and DBMS problems. I maintain my opinion that natural joins are only of academic interest and important DBMSs such as SQL Server do not implement this on purpose - adding a condition explicitly leads to better code understanding and maintenance. A related question: http://stackoverflow.com/questions/4826613/natural-join-in-sql-server – Bogdan Gavril MSFT Mar 06 '13 at 18:44
-
I have to agree about Natural joins, they are in practice simply dangerous. Databaseas have to be maintained over time and using something this fragile to changes is just a plain bad idea. There is a good reason why many of the major database players do not allow natural joins at all. I woudl also submit that while you can join on inequalities or left(somefield, 2) or something other than =, it is almost always a sign of a bad data model if you have to do so. – HLGEM Mar 06 '13 at 19:14
-
@BogdanGavril: glad you take comments in the spirit they were offered. FWIW the SQL Server team have not closed the door on natural join, indeed one can vote for it [here](http://connect.microsoft.com/SQLServer/feedback/details/153679/natural-join-and-using-clause-in-joins). I find it really hard to imagine the SQL Server team refusing to implement a feature on such philosophical grounds, to think they wouldn't trust users to use a feature safely! – onedaywhen Mar 07 '13 at 11:35
-
1@HLGEM: one could make similar arguments against `SELECT * FROM...` (and perhaps you do). But it is in the language, it is in every SQL implementation and I use it often (and I wager you do too!) Hint not all code is production code. – onedaywhen Mar 07 '13 at 11:39
-
-
2The real problem with "natural" joined column isn't about changing names but adding new ones that must not conflict between all possibly joined tables in the system. Take very common columns such as "name", "description", ... Using "natural join" will make them joined whereas it makes nonsense and more is against business logic and leads to error. So yes, "natural join" is dangerous. It forces you to have distinct names except for (primary/foreign) key colulmns and lose "name spacing". – LoganMzz Nov 08 '16 at 13:31
-
1However I don't agree that databases don't optimize non-equijoin. Range scan is a well-known feature. – LoganMzz Nov 08 '16 at 13:35
-
@LoganMzz - agreed. What I mean is that the algorithms used to perform a non-equi join have higher complexity than the ones used for other types of join. DB implementations try to optimize each operation, but no matter how smart your non-equi join algorithm is, it cannot compete with an equi join – Bogdan Gavril MSFT Dec 07 '16 at 16:40
-
@BogdanGavril-MSFT First I don't really compare both due to 1-1 *key-based* relationship VS 1-N one, it doesn't make any sense. Seconds, it depends on index definitions which depends on data usage (write VS read). Range scan is better than (index) full-scan + skip and "covering queries" may also be an important factor. Finally, data design is a very disregarded but powerful field in CS. – LoganMzz Dec 08 '16 at 11:41
@outis's answer is good: concise and correct as regards relations.
However, the situation is slightly more complicated as regards SQL.
Consider the usual suppliers and parts database but implemented in SQL:
SELECT * FROM S NATURAL JOIN SP;
would return a resultset** with columns
SNO, SNAME, STATUS, CITY, PNO, QTY
The join is performed on the column with the same name in both tables, SNO
. Note that the resultset has six columns and only contains one column for SNO
.
Now consider a theta eqijoin, where the column names for the join must be explicitly specified (plus range variables S
and SP
are required):
SELECT * FROM S JOIN SP ON S.SNO = SP.SNO;
The resultset will have seven columns, including two columns for SNO
. The names of the resultset are what the SQL Standard refers to as "implementation dependent" but could look like this:
SNO, SNAME, STATUS, CITY, SNO, PNO, QTY
or perhaps this
S.SNO, SNAME, STATUS, CITY, SP.SNO, PNO, QTY
In other words, NATURAL JOIN
in SQL can be considered to remove columns with duplicated names from the resultset (but alas will not remove duplicate rows - you must remember to change SELECT
to SELECT DISTINCT
yourself).
** I don't quite know what the result of SELECT * FROM table_expression;
is. I know it is not a relation because, among other reasons, it can have columns with duplicate names or a column with no name. I know it is not a set because, among other reasons, the column order is significant. It's not even a SQL table or SQL table expression. I call it a resultset.

- 55,269
- 12
- 100
- 138
-
-
Why do you say *"I don't quite know what the result of `SELECT * FROM table_expression;` is"*? – Pacerier Apr 18 '15 at 06:36
-
@Pacerier: erm, because I don't know what it is! Last time I looked, the SQL Standard avoided defining what it is. I do know what it is not (not a relation, not a set, not a table, not a table expression). So for ease of reference I employed my own term, 'resultset'. Note that in the relational model the result of an operation involving two relations is a relation. The equivalent statement cannot be made for SQL AFAIK. – onedaywhen Apr 21 '15 at 15:50
Natural is a subset of Equi which is a subset of Theta.
If I use the = sign on a theta join is it exactly the same as just using a natural join???
Not necessarily, but it would be an Equi. Natural means you are matching on all similarly named columns, Equi just means you are using '=' exclusively (and not 'less than', like, etc)
This is pure academia though, you could work with relational databases for years and never hear anyone use these terms.

- 9,665
- 1
- 30
- 38
-
I suspect that when you say "relational databases" I suspect you mean something else e.g. "SQL". – onedaywhen Oct 24 '11 at 07:54
-
Work that isn't academia with relational databases that aren't SQL? So which products DO you mean? – onedaywhen Oct 24 '11 at 14:12
-
@onedaywhen I said they were relational, I never said they weren't sql. Love your caps usage, its really conveying how dumbfounded you are by your own strawman. – heisenberg Oct 24 '11 at 14:17
-
3In Codd's original algebra, natural join is the fundamental type of join whereas an equi- or theta- "join" is shorthand for a NJ (e.g. cross product) followed by a restriction. *"Natural is a subset of Equi which is a subset of Theta"* presumably what that means is that every NJ could also be expressed as an EJ or TJ. I suppose that's true if σ 1=1 (A x B) counts as a equijoin, in which case every operation of the relational algebra could be expressed as an equijoin in that form. The ambiguity here is that there is more than one possible set of fundamental operators for the RA. – nvogel Sep 21 '12 at 16:13
-
@sqlvogel, is the quote in your comment from a paper by Codd? I'm asking because I saw [this question](http://cs.stackexchange.com/questions/10158/origin-of-theta-joins) asking about the origin of theta-joins and Codd. – Eric Fail Mar 01 '13 at 22:23
-
2@EricFail: sqlvogel is just quoting kekekela's answer, rather than anything from Codd. If you want more on Codd's writings about joins (θ or otherwise), you might try "The Relational Model for Database Management", or work your way through his [bibliography](http://www.informatik.uni-trier.de/~ley/pers/hd/c/Codd:E=_F=.html). – outis Mar 05 '13 at 02:29
-
1... The question you link to has an answer that gets close to what you're looking for, probably as close as is possible. It links to [Relational Completeness of Data Base Sublanguages](http://classes.soe.ucsc.edu/cmps277/Fall08/Papers/Codd72a.pdf). P. 10 describes the connection between θ, = and natural joins (though natural are not strictly subsets of = in Codd's formulation, but rather the projection of =-joins). – outis Mar 05 '13 at 02:41
-
@outis, thanks. In regard to _what I am looking for_, I am looking for a the argument that technically explains how the _[n]atural is a subset of Equi which is a subset of Theta_. Technically you answered my question. If you add it in as an answer I'll award you the bounty. – Eric Fail Mar 05 '13 at 17:34
-
@outis, I actually have [a question](http://stackoverflow.com/questions/15010966/can-the-natural-join-be-seen-as-a-subset-of-the-equi-join-and-theta-join) where I am trying to get at the relationship between the the natural-join, the equi-join and theta-join, but none of the answers is really definitive. I was inspired by this thread as I thought that Codd had said something about the relationship and if the joins meaningfully could bee seen as a subsets of each other and is this position could be defended technically. I guess Codd haven't said anything about them being subset of each other. – Eric Fail Mar 05 '13 at 18:05
-
@heisenberg, And what would be the result of a table natural joining another table that has no identical column names? – Pacerier Apr 18 '15 at 06:38
Theta Join:
When you make a query for join using any operator,(e.g., =, <, >, >= etc.), then that join query comes under Theta join.
Equi Join:
When you make a query for join using equality operator only, then that join query comes under Equi join.
Example:
> SELECT * FROM Emp JOIN Dept ON Emp.DeptID = Dept.DeptID; > SELECT * FROM Emp INNER JOIN Dept USING(DeptID)
This will show: _________________________________________________ | Emp.Name | Emp.DeptID | Dept.Name | Dept.DeptID | | | | | |
Note: Equi join is also a theta join!
Natural Join:
a type of Equi Join which occurs implicitly by comparing all the same names columns in both tables.
Note: here, the join result has only one column for each pair of same named columns.
Example
SELECT * FROM Emp NATURAL JOIN Dept
This will show: _______________________________ | DeptID | Emp.Name | Dept.Name | | | | |

- 60
- 1
- 7

- 319
- 4
- 11
Cartesian product of two tables gives all the possible combinations of tuples like the example in mathematics the cross product of two sets . since many a times there are some junk values which occupy unnecessary space in the memory too so here joins comes to rescue which give the combination of only those attribute values which are required and are meaningful.
inner join gives the repeated field in the table twice whereas natural join here solves the problem by just filtering the repeated columns and displaying it only once.else, both works the same. natural join is more efficient since it preserves the memory .Also , redundancies are removed in natural join .
equi join of two tables are such that they display only those tuples which matches the value in other table . for example : let new1 and new2 be two tables . if sql query select * from new1 join new2 on new1.id = new.id (id is the same column in two tables) then start from new2 table and join which matches the id in second table . besides , non equi join do not have equality operator they have <,>,and between operator .
theta join consists of all the comparison operator including equality and others < , > comparison operator. when it uses equality(=) operator it is known as equi join .

- 9
- 2
Natural Join: Natural join can be possible when there is at least one common attribute in two relations.
Theta Join: Theta join can be possible when two act on particular condition.
Equi Join: Equi can be possible when two act on equity condition. It is one type of theta join.

- 1,049
- 10
- 23

- 3,588
- 2
- 39
- 40