-1

I was given the following question:

Write a SQL statement to make a join on the tables salesman, customer and orders in such a form that the same column of each table will appear once and only the relational rows will come.

I executed the following query:

SELECT * FROM orders NATURAL JOIN customer NATURAL JOIN salesman;

However, I was not expecting the following result:

enter image description here

enter image description here

My doubt lies in step 2.

Why didn't I get the rows with salesman_id 5002, 5003 & 5007?

I know that natural join uses the common columns to finalize the rows.

Here all the Salesman_ids are present in the result from step 1.

Why isn't the final result equal to the table resulting from step 1 with non duplicate columns from salesman added to it?

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • Wow. Is this really the expected solution to this task? While technically correct, this is a good example on why not to use `natural join`. The reason for the "loss" of rows in the last step is that a natural join will also join using the city-column (that has the same name but a different meaning in both customer and salesman). Such a misleading image (as it is not showing the city column) to (not) explain what is going on in the last step. (Not sure if the one making it even understood it.) `using (salesman_id)` might do what you expect (but it's not the task). – Solarflare Jul 26 '17 at 20:54
  • Also there seems to be a problem with the customer table, as the `salesman_id` does not make sense there (at least not without explaining). It might be meant as something like a `default_salesman_id`. It might get you in trouble when you join (both with `using` or a `natural join`), if the values in `order` and `customer` are not the same. And if they are always the same, it is bad normalizitation, as you could remove one of them without losing any information. – Solarflare Jul 26 '17 at 20:57
  • I would use this as a reason on why not to use `natural join` (or `using` too, although less dangerous). Also, although it wasn't used here, never use `,` to join two tables (which you might learn somewhere too). The way to go is: `select from tablea join tableb on a.keycolumn = b.keycolumn`. While `natural join` or `using` will remove the duplicated column, you can get the same effect if you list all columns (and not list such a column twice). You lose a bit of comfort, but it makes your query A LOT safer/clearer/more understandable/ – Solarflare Jul 26 '17 at 20:58
  • Your quoted goal does not make sense. Please give a better translation. Also "natural join uses the common columns to finalise the rows" doesn't make sense. Neither does "the table resulting step 1 itself with non duplicate columns from salesman added to it". Please edit to make these clear. Don't try to cram words into one sentence. It would also help if you gave the output you were expecting. [mcve] That fuzzy statement about what it "uses" doesn't say what it *does*. What *definition* of natural join do you think you are using? What happens when you use it for the 2nd step? – philipxy Oct 22 '17 at 19:04

2 Answers2

1

... the same column of each table will appear once

Yes Natural Join does that.

... and only the relational rows will come.

I don't know what that means.

I disagree with those who are saying: do not use Natural Join. But it is certainly true that if you plan to use Natural Join for your queries, you must design the schema so that (loosely speaking) 'same column name means same thing'.

Then this exercise is teaching you the dangers of having same-named columns that do not mean the same thing. The danger is sometimes called the 'connection trap' or 'join trap'. (Not really a trap: you just need to learn ways to write queries over poorly-designed schemas.)

A more precise way to put that: if you have columns named the same in two different tables, the column must be a key of at least one of them. So:

  • city is not a key in any of those tables, so should not get 'captured' in a Natural Join.
  • salesman_id is not a key in table customer, so should not get 'captured' in the join from table orders.

The main way to fix up this query is by renaming some columns to avoid 'capture' (see below). It's also worth mentioning that some dialects of SQL allow:

SELECT *
FROM orders
NATURAL JOIN customer ON customer_id
...

The ON column(s) phrase means: validate that the only columns in common between the two tables are those named. Otherwise reject the query. So your query would be rejected.

Renaming means that you shouldn't use SELECT *. (Anyway, that's dangerous for 'production code' because your query might produce different columns each time there's a schema change.) The easiest way to tackle this might be to create three Views for your three base tables, with the 'accidental' same-named columns given some other name. For this one query:

SELECT ord_no, purch_amt, ord_date, customer_id,
       salesman_id AS order_salesman_id
FROM orders
NATURAL JOIN (SELECT customer_id, cust_name,
                     city AS cust_city, grade,
                     salesman_id AS cust_salesman_id
              FROM customer) AS customer_grr
NATURAL JOIN (SELECT salesman_id, name,
                     city AS salesman_city,
                     commission
              FROM salesman) AS salesman_grr

I'm using explicit AS to show renaming. Most dialects of SQL allow you to omit that keyword; just put city cust_city, ....

AntC
  • 2,623
  • 1
  • 13
  • 20
  • Which dialect allows `natural join on`? The usual syntax to write this would be `join using (column)` (which, same as the natural join, also includes a projection). Since it sounded like you refered to my comment: one could of course argue if natural join should (not) be used. But if you tackle the problems of natural joins by views and selecting and renaming columns, you can also just use a join (without views, renaming and operational limits). The only advantage of a natural join is the implied projection - but, as you mentioned, in any production code you have to list the columns anyway. – Solarflare Jul 27 '17 at 09:06
  • Thanks @Solarflare, this is a difficult issue, and there's no answer I'm happy about. (So this is going well beyond the original question.) `Natural Join` was an afterthought in SQL, and that really shows. `NATURAL LEFT/RIGHT/INNER/OUTER JOIN` is just garbage: there is only one `Natural Join`. `JOIN ... USING( ... )` is not `Natural Join`: there might be attributes in common not mentioned in the `USING`. The dialects supporting `Natural Join ... ON ...` are 'experimental'/not from mainstream vendors/not compliant to the SQL standard. Unlike USING, they reject 'bad' joins, as I said. – AntC Jul 27 '17 at 12:26
  • Re renaming columns: if you design the schema in advance to use `Natural Join`, you shouldn't need rename; and you shouldn't need to dot-prefix column names. Ever. I'm guessing the purpose of the exercise in the O.P. was to teach the dangers of `Natural Join` and the importance of column names. In real life the schema is already given (probably by the vendor of the package). And the designer didn't think about `Natural Join`. (It's only come into the SQL standard relatively recently; and many vendors were very slow to support it.) So yeah. SQL started as a pile of poo, and just got worse. – AntC Jul 27 '17 at 12:37
  • The asker's error is not the connection trap. That involves expecting *fewer* rows than the natural join. Codd 1970: "A conclusion is now drawn which is, in general, erroneous: namely that, if all possible paths are followed from a given supplier via the parts he supplies to the projects using those parts, one will obtain a valid set of all projects supplied by that supplier." – philipxy Oct 22 '17 at 20:21
  • Constraints have nothing to do with using natural join. They might have to do with the asker's wrong expectations. (Not that the question is clear about those.) Also, there are also no wrongly-named-column problems here, because (sloppily) salesman_id & customer_id *do* mean the same thing in the inputs & the output. If your writing re your points & justifications were clear then these problems would become evident. PS 1. Yes, there are natural outer joins. Left & right just add rows with null/whatever to the corresponding natural join. 2. What SQL allows expressing common-column expectations? – philipxy Oct 22 '17 at 20:34
  • The assignment is unintelligible. What do you claim it is asking for? If you think it is asking for the natural join of 3 tables, it certainly isn't trying teach about anything other what an example of one looks like, and in particular it doesn't contain anything about any misconceptions a solver might have about it. So again, either your thinking or your writing isn't clear. – philipxy Oct 22 '17 at 20:42
  • "Constraints have nothing to do with using natural join. " Quite. So nobody mentioned constraints. Why are you? "there are also no wrongly-named-column problems here" ?? Did you read the question (given schema); or my answer? There is attribute `city` in both table `customer` and `salesman`. Those `city`s do not "mean the same thing" as I put it, because neither is a key in their respective tables. – AntC Oct 23 '17 at 00:24
  • I see I missed that there are 2 city columns input, so it can reasonably be sloppily said that one city "means" customer city & another "means" salesman city. But it is misconceived to talk about "columns" meaning things--it is values & rows that mean things. To say that there's something in some way wrong with naming the two columns both city is also misconceived--that naming just means that the natural join returns row whose customer & salesman have the same city; if someone expects otherwise it just means they don't understand natural join, it doesn't mean the design is somehow flawed. – philipxy Oct 23 '17 at 02:08
  • (Just saw your last comment. I've already dealt with 'city'.) Re "So nobody mentioned constraints." No, somebody did--you: "the column must be a key of at least one of them". Re all my comments, and other issues with your answer, I've now posted an answer. But it would be nice if you edited your answer to clearly say things, including not putting things in scare quotes, as if that made them clear. Although you would find that you say a lot of unjustified things. I realize you are phrasing things the way people frequently phrase them, but they aren't making any more sense than you are. – philipxy Oct 23 '17 at 03:20
  • AFAIK yours isn't valid SQL syntax because SQL requires the presence range variables ('correlation names', 'aliases') after the derived tables, even though the point of natural join is to avoid duplicate column names (range variables were introduced into SQL to deal with the duplicate column name problem). – onedaywhen Jan 09 '18 at 09:22
  • Thanks @onedaywhen, yes you're right; I've corrected my code. This pointless alias to "avoid duplicates" where there's no duplicates is what Date & Darwen call teeth-gnashers. – AntC Jan 10 '18 at 04:52
  • Indeed, not many around here would get the Darwen reference, which I've used myself on SO in the past e.g. [here](https://stackoverflow.com/questions/9664776/sql-table-sub-query-alias-conventions/9666092#9666092). – onedaywhen Jan 11 '18 at 12:42
0

Why isn't the final result equal to the table resulting from step 1 with [...]?

Because natural join doesn't work how you expect--whatever that is, since you don't say.

In terms of relational algebra: Natural join returns the rows
• whose column set is the union of the input column sets and
• that have a subrow in both inputs.

In business terms: Every table & query result holds the rows that make some statement template--its (characteristic) predicate--its "meaning"--into a true statement. The designer gives the predicates for the base tables. Here, something like:

Orders = rows where
    order [ord_no] ... and was sold by salesman [salesman_id] to customer [customer_id] 
Customer = rows where
    customer [customer_id] has name [cust_name] and lives in city [city]
        and ... and is served by salesman [salesman_id]
Salesman = rows where
    salesman [salesman_id] has name [name] and works in city [city] ...

Natural join is defined so that if each input holds the rows that make its predicate into a true statement then their natural join holds the rows that make the AND/conjunction of those predicates into a true statement. So (your query):

Orders natural join Customer natural join Salesman = rows where
    order [ord_no] ... and was sold by salesman [salesman_id] to customer [customer_id] 
and customer [customer_id] has name [cust_name] and lives in city [city]
        and ... and is served by salesman [salesman_id]
and salesman [salesman_id] has name [name] and works in city [city] ...

So that natural join is asking for rows where, among other things, the customer lives in the city that the salesman works in. If that's not what you want, then you shouldn't use that expression.

Note how the meaning of a natural join of tables is a (simple) function of the meanings of its input tables. That's so for all the relational operators. So every query expression has a meaning, built from its base table meanings & relational operators.
Is there any rule of thumb to construct SQL query from a human-readable description?

Why didn't I get the rows with salesman_id 5002, 5003 & 5007?

Because those salesmen don't work a city in which one of their customers lives.

philipxy
  • 14,867
  • 6
  • 39
  • 83