Convert SQL Query to Relational Algebra

Question

I need some help converting an SQL query into relational algebra.

Here is the SQL query:

SELECT * FROM Customer, Appointment
WHERE Appointment.CustomerCode = Customer.CustomerCode
    AND Appointment.ServerCode IN
    (
        SELECT ServerCode FROM Appointment WHERE CustomerCode = '102'
    )
;

I'm stuck because of the IN subquery in the above example.

Can anyone demonstrate for me how to express this SQL query in relational algebra?

Many thanks.

EDIT: Here is my proposed solution in relational algebra. Is this correct? Does it reproduce the SQL query?

Scodes ← ΠServerCode(σCustomerCode='102'(Appointment))

Ccodes ← ΠCustomerCode(Appointment ⋉ Scodes)

Result ← (Customer ⋉ Ccodes)

It may help you to refactor IN sub-select to correlated sub-query using EXISTS operator: `AND EXISTS (SELECT 'found' FROM Appointment a2 WHERE a2.CustomerCode = '102' AND a2.ServerCode = Appointment.ServerCode )` — topchef, Nov 01 '11 at 00:53
This link might help you, http://stackoverflow.com/questions/3850816/sql-represent-a-subquery-in-relational-algebra — , Nov 01 '11 at 07:23
You have a projection too much, a semijoin too much and a natural join too little. Basic rule of thumb : each WHERE should get you a Sigma, each SELECT (except SELECT *) should give you a Pi, each comma between things in the FROM list should give you a cartesian product or join. (Don't take this as a law of the universe. Algebra expressions can often be rewritten into equivalent expressions that no longer line up with this.) — Erwin Smout, Nov 03 '11 at 15:42
Does this answer your question? [Represent a subquery in relational algebra](https://stackoverflow.com/questions/3850816/represent-a-subquery-in-relational-algebra) — philipxy, Sep 23 '21 at 16:46

onedaywhen · Accepted Answer · 2011-11-01T14:45:08.740

4

Your SQL code will result in duplicate columns for CustomerCode and the use of SELECT [ALL] is likely to result in duplicate rows. Because the result is not a relation, it cannot be expressed in relational algebra.

These problems are easily fixed in SQL:

SELECT DISTINCT * 
  FROM Customer NATURAL JOIN Appointment
 WHERE Appointment.ServerCode IN
    (
        SELECT ServerCode FROM Appointment WHERE CustomerCode = '102'
    )
;

You didn't specify which relational algebra you are intereted in. Date and Darwen proposed an algebra named A, specified an A language named D, and designed a D language named Tutorial D.

Tutorial D uses operators JOIN for natural join, WHERE for restriction and MATCHING for semijoin, The slight complication is the comparison in SQL:

CustomerCode = '102'

The comparison of a CustomerCode value to a CHAR value in SQL is possible because of implicit coercion. Tutorial D is stricter -- type safe, if you will -- requiring you to overload the equality operator or, more practically, define a selector operator for CHAR, which would typically have the same name as the type.

Therefore, the above (revised) SQL may be written in Tutorial D as:

( Customer JOIN Appointment ) 
   MATCHING ( ( Appointment WHERE CustomerCode = CustomerCode ( '102' ) ) { ServerCode } )

edited Nov 01 '11 at 14:45

answered Nov 01 '11 at 09:03

onedaywhen

55,269
12
100
138

(A) Be careful about operator precedences. Parenthesize if unsure. (B) Since Appointment also participates in the JOIN, there is simply no point in doing the MATCHING at all and it further reduces to "Customer JOIN Appointment WHERE CustomerCode = CustomerCode ( '102' ) ". – Erwin Smout Nov 01 '11 at 13:16
And you'd have to know all the attributes of both relvars before you can be sure the JOIN is indeed a NATURAL one. – Erwin Smout Nov 01 '11 at 13:19
@ErwinSmout: (A) I don't see that it makes a difference here but OK. (B) your reduction is correct but is not equivalent to the SQL, meaning my query was wrong. Now corrected. p.s. I realise I am making assumptions about natural join but they seem reasonable for the info we have. Using another join type would similarly involve making assumptions about columns to project away. Thanks for feedback :) – onedaywhen Nov 01 '11 at 14:52
Thanks for the replies. I didn't realise that there was more than one type of relational algebra. I am using the standard that employs Greek characters for select, project etc. This seems to be different to the convention used by Date. I am slightly confused now. How do I represent my query in this standard form of RA? – user1022788 Nov 01 '11 at 15:45
"How do I represent my query in this standard form of RA?" I'll try to say some things about this in a full-blown response. It's a bit hard to do that in the space allowed in a comment (which also doesn't allow for spacing and such stuff). – Erwin Smout Nov 01 '11 at 15:52
Note because of the projection over `ServerCode` , `MATCHING` may be substituted with `JOIN`. – onedaywhen Nov 02 '11 at 08:47
I have updated the original question with my proposed reproduction of the query in relational algebra. Is this correct? Any feedback would be most welcome. – user1022788 Nov 03 '11 at 14:21

Erwin Smout · Answer 2 · 2011-11-01T18:46:21.440

2

"How do I represent my query in this standard form of RA?"

It's not so much a question of "type of algebra" as it is of "type of notation".

Notation using greek symbols typically uses sigma, the restrict condition in subscript appended to the sigma character, and then the subject of the restriction (the relational expression that is subjected to the restrict condition).

Date avoid that notation, because typesetting and/or creating text using such notations is usually a lot harder than it is using just the western alphabet (a math teacher of mine once told us that math textbooks contain the most errors of all).

σ <cond> (<rel exp>) thus denotes the very same algebra expression as (Date's syntax) "<rel exp> WHERE <cond>".

Similarly, with greek symbols, projection is typically denoted using the letter Pi, with the list of retained attributes in subscript appended to the Pi, and the expression that is the subject of the projection following that.

Π <attr list> (<rel exp>) thus denotes the very same algebra expression as (Date's syntax) "<rel exp> { <attr list> }".

The join family of operators is usually denoted, in "greek" symbols, using (variations of) the Unicode BOWTIE character, or that character consisting of a lowercase letter 'x' surrounded by a full circle (usually used to denote full cartesian product, cross-product, ... whatever your algebra course happens to name it).

Some courses provide a "greek-symbol" notation for rename, using the greek letter Rho. Appended in subscript is the rename list, in the form a1->b1,a2->b2,... Appended after that comes the relational expression that is subjected to the rename. Likewise, Date has a non-greek-symbol equivalent syntax : <rel exp> RENAME a1 AS b1, a2 AS b2 , ...

The important thing is to see that these differences are merely differences in syntactical notation, not "different algebrae".

EDIT

One could imagine that the greek symbols notation would be the way to program relational algebra into an APL engine, Date's syntax would be the way to program relational algebra into a cobol-like or PL/1-like engine (there effectively exists such an engine called Rel), and the way to program relational algebra into an OO-like engine, could look something like relation.NaturalJoin(otherRelation).Matching(yetOtherRelation.Restrict(condition).project(attributesList)).

edited Nov 01 '11 at 18:46

answered Nov 01 '11 at 16:14

Erwin Smout

18,113
4
33
52

Ok, thanks for the answer - very informative. Could you show me how to rewrite the query that you gave in Tutorial D using the Greek characters? The crux of my problem is how to represent the MATCHING part of the query. Thanks for all the help. – user1022788 Nov 01 '11 at 18:57
MATCHING is just the SEMIJOIN operator. See e.g. http://en.wikipedia.org/wiki/Relational_algebra#Semijoin_.28.E2.8B.89.29.28.E2.8B.8A.29 . I do try to live up to a policy not to spoonfeed complete solutions. (BTW the Tutorial D query was given not by me, but by @onedaywhen.) – Erwin Smout Nov 01 '11 at 19:34
I admit this is an area I am not clear on. Consider this quote from Darwin: "Sometimes that term, relational algebra, is used with the definite article: *the* relational algebra, even though several minor variations exist in the literature. Indeed, the term relational completeness is sometimes defined with reference to “the” relational algebra -- a language deemed relationally complete if it supports, directly or indirectly, all of the operators of "that" algebra." – onedaywhen Nov 02 '11 at 08:24
...as I understand it, Codd's algebra includes a product operator, D&D's includes a natural join operator and both are mutually exclusive. On the other hand, both are relationally complete (Codd's omission of a rename operator aside). I'm left wondering if an answer including natural join would be an acceptable answer if the teacher had Codd's algebra in mind, regardless of notation. – onedaywhen Nov 02 '11 at 08:27
That could become a lengthy exchange ... Formally, an algebra is a set of operators. Often, the predicate "that is closed over some set of types" is also appended, but that doesn't make a material difference wrt your point. And since the set {PROJECT, PRODUCT} is not the same set as {PROJECT, NATURALJOIN}, the algebra's they define are also different/distinct/... Meaning that indeed there is no such thing as "THE" relational algebra. ... – Erwin Smout Nov 02 '11 at 13:06
A thesis also evidenced by their most recent book, Database Explorations : "He [Codd] defined a relational algebra (similar but not identical to the version of the algebra embraced in reference [16], consisting of the operators [enumeration here].". Note the definitely indefinite article in "A relational algebra" and the explicit admission of algebra's being "different" in "similar but not identical". Date even devotes a complete chapter in "Logic & Databases" to this question. The conclusion seems to be that formally, there may be different algebra's but they are all sufficiently alike ... – Erwin Smout Nov 02 '11 at 13:14
... to justify the term "THE" relational algebra. – Erwin Smout Nov 02 '11 at 13:15
(I don't agree with that conclusion for the full 100%. Very often, "the" relational algebra is thought of as an algebra that does not include any kind of transitive closure operation. The nice thing about such an algebra (Codd's was one such) is that it lines up with first-order logic. The ugly thing about such algebra is it cannot express reachability queries in graphs. Bringing in transitive closure takes you outside the realm of first-order logic, because you need recursion. That makes the difference a bit less "minor" than certain other differences imo.) – Erwin Smout Nov 02 '11 at 13:23

Convert SQL Query to Relational Algebra

2 Answers2