8

The following code will work to select data from two tables:

SELECT t1.foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE2 t2 ON t1.foo=t2.foo

I could just as easily written

SELECT t2.foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE2 t2 ON t1.foo=t2.foo

t1.foo or t2.foo: six of one or half a dozen of the other. Why not just foo?

I've been wonder why doesn't the SQL server just automatically return the data without me specifying one table or the other since the choice is entirely arbitrary (as far as I can tell).

I can make up a scenario where you would need to specify the table, such as

SELECT t1.foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE t2 ON t1.foo+=t2.foo

However, such scenarios are far from the norm in my experience.

Can anyone enlighten me as to why the language is designed so that I have to make this seemingly arbitrary decision in my code?

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
Vivian River
  • 31,198
  • 62
  • 198
  • 313
  • 1
    While you're at it, why don't we just have GROUP BY w/o columns? Hell, it should know what I want to group by... everything that gives me an error when I run it! (joking) – SQLMason May 11 '11 at 15:02
  • 3
    @Dan, then you will need a license for Oracle mindreader (TM) – Lukas Eder May 11 '11 at 15:12

7 Answers7

5

Because equality in MS SQL doesn't necessarily mean they are equal in the values you want. Consider the following 2 values for foo: "Bar", "baR". Sql will believe them to be equal with respect to the join because of the case insensitivity inherent in the comparison, but which one were you asking for? SQL Server doesn't know, and it can't guess. You must explicitly tell it.

Edit:As @Lukas Eder brought up, not all implementations of SQL use case insensitive comparisons. I know MS SQL uses case insensitivity, and my answer is geared with this concept in mind.

Joel Etherton
  • 37,325
  • 10
  • 89
  • 104
  • Which RDBMS implements a case-insensitive `=` comparison? – Lukas Eder May 11 '11 at 15:10
  • my vote is for this answer. you have to know (and tell) which table contains the actual data you want to find a match for. There are rare cases when you know that the data MUST be the same in both tables ie a well designed datawarehouse with proper masterdata management would be one. But apart from that use, you need to make the decision. – T9b May 11 '11 at 15:10
  • @Lukas Eder - it's common practice for datawarehouses to convert all text to uppercase prior to storing data into the database, to avoid exactly the issue raised in this answer. It's because the comparisons fail when case is used, that the conversion needs to be done. – T9b May 11 '11 at 15:15
  • Common practice, OK. I do that too. But still, it's not correct to say that in SQL `'Bar' = 'baR'` (in general). So I was wondering which RDBMS may perhaps implement it in this way – Lukas Eder May 11 '11 at 15:23
  • @Lukas Eder: You're right, it's not correct because it's not correct in all implementations of SQL. I will apply a software specific update. Thanks. – Joel Etherton May 11 '11 at 15:51
  • @Joel, sweet, I didn't know that about SQL Server... One always learns on stackoverflow... – Lukas Eder May 11 '11 at 15:57
  • The default collation in SQL Server is case-insensitive (as are other platforms like MySQL). But case is always preserved in the data in the column, so you can get different data in the two columns, but they will still be joined. Equality (and inequality) all depend upon the collation, and this may also include accent issues. It is also possible to set SQL Server to use many different collations at different levels - including column level and database object level. – Cade Roux May 11 '11 at 18:36
  • How is this relevant? Doesn't the `SELECT t2.foo, t2.bar` portion of the statement serve the purpose of unambiguously requesting a particular of the two foo columns? The column being used to perform the join doesn't even have to be selected. – Dan Bechard Nov 15 '12 at 19:48
  • 1
    @Dan: This case is over simplistic. When you get into a more complicated join the `foo` could come from any number of tables and mean any number of things. Also, just because you specify `t1.foo = t2.foo` doesn't mean that the query is necessarily accurate. In this case, SQL has given you enough rope to hang yourself with. It us up to the person writing the query not to do something silly, and yes, this query could be reduced to remove the join, but if you add selections of other columns between t1 and t2 it'll need to be restored. – Joel Etherton Nov 15 '12 at 19:54
4

Your reasoning is not quite true. While t1.foo = t2.foo may hold true, that doesn't mean they're the same. Some examples:

  • One could be VARCHAR(1) the other VARCHAR(2)
  • One could be VARCHAR(1) the other NUMBER(1)
  • t1 could be a simple table, whereas t2 is a view (or nested select) that makes hyper-complex calculations for the value of foo. The projection cost of either foo might not be the same in some RDBMS.

And there are dozens of other reasons, why it would be ambigous to just write foo

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
3

If you're sure that the columns represent the same thing you could join with a USING clause.

SELECT foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE2 t2 USING (foo);

Otherwise there's no guarantee that t1.foo is the same thing as t2.foo

Parris Varney
  • 11,320
  • 12
  • 47
  • 76
1

In this case you have a INNER JOIN so it's clear that the decision is arbitrary. But there are many situations where even if you join on FOO the 2 are not the same.

EX: in case of a LEFT JOIN OR in case you have something like ON t1.foo = t2.foo+/-/whater

The engine needs your input to know where to take the data from.

Dumitrescu Bogdan
  • 7,127
  • 2
  • 23
  • 31
  • Or multiple INNER JOINs, in which case it can't know which other table you want to join on foo. I think this answer is more accurate than the accepted answer which seems to imply the importance lies in which column is selected in the result set (which is not even required). – Dan Bechard Nov 15 '12 at 19:51
0

The reason you need to make this decision is that it isn't arbitrary. The system does not know which table has the data you want. You need to specify it. When the system designs the execution plan, it does not figure out which columns contain the same data in both tables. As far as it is concerned, these two columns could have different data. It isn't going to extrapolate that since you are saying these columns are equal that it could display either column when one isn't specified.

IAmTimCorey
  • 16,412
  • 5
  • 39
  • 75
0

In that particular case, t1.foo and t2.foo are the same thing, but the engine isn't optimized for that (and would be confusing if it was). What if your join did something where they may not be the same, like this?

SELECT t2.foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE2 t2 ON t1.foo<t2.foo

Since we are using <, foo on t1 and t2 could be very different things. The engine can't "guess" in this case.

Just because those experiences are "far from the norm" in your experience, the engine has to allow for it, otherwise it would make some types of queries extremely difficult to write.

vcsjones
  • 138,677
  • 31
  • 291
  • 286
0

SQL doesn't do it because it simply doesn't resolve ambiguities. (But as you note they are equivalent.)

For application lifecycle it's ultimately better to resolve them yourself, because if a column changes name or the join type changes, your code is less likely to be broken and it's more obvious what your intentions were. But those benefits weren't intentional, I'm sure.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265