3

Are there are analytical consequences on JOINing a table to itself ON a column that does not hold unique values?

Most of the self join queries I have seen are done on a PK column. I have run into problems getting my numbers right so I am wondering if what I am doing makes sense.

E.g.

SELECT * 
FROM Employee a
JOIN Employee b
ON a.Birthdate = b.Birthdate
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
disasterkid
  • 6,948
  • 25
  • 94
  • 179
  • 1
    Define _analytical consequences_? You mention you have run into problems, what are they? – Jonathon Ogden Jul 26 '16 at 07:28
  • @JonathonOgden this is a more elaborate version of my question: http://stackoverflow.com/questions/38377403/self-join-produces-inconsistent-value-for-last-month – disasterkid Jul 26 '16 at 07:29
  • Broadly speaking, since it's a non-unique column, you'll obviously be joining records to themselves. So an employee record in A will join to itself in B without an additional unique key column such as an employee ID to filter it. What _analysis_ will you be doing on this data? That will help give you a more specific answer. – Jonathon Ogden Jul 26 '16 at 07:38

2 Answers2

1

As your query is written now, it will return you all pairs of employees that:

  1. Different employee records that have the same Birthdate
  2. All employee records will be returned linked to themselves

I do not think this would ever make much sense. Can you provide more information of what you want to achieve and what is the structure of the Employee table?

In case Birthdate is nullable, then you will have to modify the ON clause to join NULL values as well, for example:

SELECT * 
FROM Employee a
JOIN Employee b
ON COALESCE(a.Birthdate,'19000101') = COALESCE(b.Birthdate,'19000101')
H. Pauwelyn
  • 13,575
  • 26
  • 81
  • 144
tomislav_t
  • 527
  • 2
  • 9
0

SQL INNER JOIN, CROSS JOIN & implicit JOIN (comma) all result in what we can call an SQL "cross product" of every possible concatenation of a left argument row with a right argument row (with columns renamed per source table alias). In the case of INNER JOIN, ON then removes all of those result rows that don't satisfy its condition. (So INNER JOIN ON 1=1 is the same as CROSS & implicit JOIN. This is also the result for dialects allowing INNER JOIN without ON.) This is so regardless of what NULLs may be in a table.

See this (dialect-independent) answer explaining INNER/CROSS/implicit JOINs. Which also explains that every JOIN "makes sense". (Although SQL meanings for non-relational SQL tables (those with NULLS or duplicate rows) are complex and obscure.) Of course, using the wrong query doesn't "make sense".

PS You probably only want to SELECT UNIQUE rows. You probably don't want the rows WHERE a.employee = a.employee. And you may not want, for columns (a.employee, ..., b.employee, ...), both the row (x, ..., y, ...) and the row (y, ..., x, ...), for which the usual solution is something like WHERE a. employee <= b.employee. But you need to know your input table and result table meanings before you can write an appropriate query.

Community
  • 1
  • 1
philipxy
  • 14,867
  • 6
  • 39
  • 83