4

I am trying to better understand how the different types of SQL JOINs interact with each other. For INNER, LEFT/RIGHT OUTER and FULL OUTER JOINs, what is the minimum and maximum no of rows that the result set can produce? What is the normal expectation of these types of JOINs?

Allan Wind
  • 23,068
  • 5
  • 28
  • 38
RiR_
  • 59
  • 1
  • 1
  • 2
  • 4
    And what did you answer? If you can't answer those I would say you are not ready for a sql server position. – Sean Lange Sep 11 '17 at 19:32
  • 2
    0 and a limit based on the maximum size of available constraints (memory, disk). – Gordon Linoff Sep 11 '17 at 19:33
  • 3
    Was this based on any specific known datasets you are joining or in general? You could give limits based on the size of the datasets. E.g. If full outer joining x rows to y rows the minimum size would be greatest(x,y) – Martin Smith Sep 11 '17 at 19:36
  • @MartinSmith wouldn't the number of rows of a full outer join of x and y be (x + y) in the case of no matching rows. – Conrad Frix Sep 11 '17 at 19:43
  • Even I said the same. The interviewer was not satisfied. She expected specific answer. And no sample data was provided. – RiR_ Sep 11 '17 at 19:44
  • 2
    @GordonLinoff Excellent answer. I've run an accidental Cartesian JOIN on a 10M row table to a 40M row table that discovered the memory limit of the server. Apparently our server couldn't handle 400T records. Oops. However, it _technically_ vapor-locked before a result set was generated, so I guess that means there were 0 records, right. :-) – Shawn Sep 11 '17 at 19:45
  • @ConradFrix yes but that's not the minimum. The minimum would be that every row in the smaller input matches one in the larger then only those remaining get null extended and preserved by the outer join. – Martin Smith Sep 11 '17 at 19:45
  • Sounds like a job I wouldn't want to have anyway to be honest. If these are the kinds of questions you get at an interview, I could only imagine the amount of mundane and trivial work the actual job would entail. – Jacob H Sep 11 '17 at 20:04
  • 2
    @RiR_ I'd have to agree with Jacob H. This doesn't sound like a healthy place to work. If the interviewer didn't like your description of a FULL OUTER JOIN, they may not have fully understood what those JOINs are. Bullet dodged. – Shawn Sep 11 '17 at 20:28
  • 2
    I personally like the question. it truly shows your understanding of joins or your ability to work though a problem. – xQbert Sep 12 '17 at 18:13
  • I will agree with @xQbert here. After reading this question again, I think it does have a lot of potential to provide good information. I have edited the question, and hope that it can be reopened. – Shawn Sep 14 '17 at 21:18
  • Please ask 1 specific researched non-duplicate question. PS "the normal expectation of these types of JOINs" is not clear. But please don't ask a 2nd question. PS What does your research & attempt to answer this show? [ask] [help] – philipxy May 27 '20 at 04:00

1 Answers1

3

table1 INNER JOIN table2

min = 0 <<< If no rows in table1 match a row in table2

max = smallest of table1 or table2 <<< If all rows in the smaller table match a row in the larger table

Common: An INNER JOIN includes only the rows that match between the two tables.

enter image description here

table1 LEFT OUTER JOIN table2

min = table1 <<< If no rows in table2 match a row in table1

max = table1+table2 - 1 row <<< If all rows in table2 match one row in table1

Common: A LEFT OUTER JOIN includes all rows from the first table, correlated to the matching rows in the second table.

enter image description here

table1 FULL OUTER JOIN table2

min = largest of table1 or table2 <<< If all rows in table2 match any row in table1

max = table1+table2 <<< If no rows in the smaller table match a row in larger table

Common: A FULL OUTER JOIN includes all rows from both tables, correlating the matching rows between the two tables.

enter image description here

For further clarity, none of these descriptions take into account a JOIN with an ON 1=1 definition, or a Cartesian product. In my opinion, this goes beyond the definition of the JOIN and becomes a CROSS JOIN instead of whatever JOIN it originally was. (Note: a CROSS JOIN will result in table1 * table2 rows.)

==================================================================

BASIC JOIN DEMO >> http://sqlfiddle.com/#!6/5489b/3

MIN/MAX JOIN DEMO >> http://sqlfiddle.com/#!6/22fca/1

Shawn
  • 4,758
  • 1
  • 20
  • 29
  • 3
    Your max are all wrong. For all of them a join condition of 1=1 would give a Cartesian product and t1 * t2 rows. – Martin Smith Sep 11 '17 at 20:10
  • A Cartesian product is not a JOIN by those definitions. Except for the FULL JOIN. – Shawn Sep 11 '17 at 20:11
  • 2
    By whose definitions? How is `SELECT * FROM T1 INNER JOIN T2 ON 1=1` not a join? – Martin Smith Sep 11 '17 at 20:12
  • max for left join could be table1 + all related records from table 2. which may not be All of table 2. – xQbert Sep 11 '17 at 20:13
  • @MartinSmith a Cartesian is A(10)*B(2)=(20) so table1 cross join table2. but join is only RELATED records in both tables. A(10) Inner join B (2) may only be between 0 and 10. – xQbert Sep 11 '17 at 20:15
  • `SELECT * FROM t1 INNER JOIN t2 ON 1=1` isn't really an `INNER JOIN` by the definition of an `INNER JOIN`. It's a `FULL JOIN` pretending to be an `INNER JOIN`. – Shawn Sep 11 '17 at 20:15
  • 2
    I think you mean `CROSS JOIN` not `FULL JOIN` but that is a perfectly syntactically valid inner join and I would expect someone answering the question maximum number of rows from an inner join to know this. – Martin Smith Sep 11 '17 at 20:18
  • @xQbert Look at my parentheses after the `LEFT OUTER JOIN` definition. In a `LEFT JOIN`, if all rows of table2 matched to table1, then the minimum that would be would be the rows of table2, but the max it can be would be all of table2 minus the one table1 record that matched. – Shawn Sep 11 '17 at 20:18
  • 1
    This comment chain just goes to show how useless questions like this are at an interview. Unless you are trying to feel out how the candidate deals with arguments. – Jacob H Sep 11 '17 at 20:19
  • I'm an absolute amateur, but `JOIN`s follow the steps of `READ`, `CROSSJOIN` and `AGGREGATE/FILTER`, correct? Meaning without `WHERE` all `JOIN`s are simply Table A x Table B. Meaning that should always be MAXIMUM (excluding hardware limits)? – Simon Sep 11 '17 at 20:20
  • @Simon the max isn't always going to be A * B. If one of the tables is empty then that will be 0 then for outer joins the max will be the size of the outer table. (Or `A + B` for full) – Martin Smith Sep 11 '17 at 20:22
  • @MartinSmith Calling a Cartesian JOIN valid syntax is getting a little pedantic. You still aren't JOINing table1 with table2 on a key. You're JOINing all rows of both tables on everything. The OP is clearly new to SQL and is interested in the differences between JOIN types. An INNER JOIN ON 1=1 Cartesian product isn't really an INNER JOIN, nor does it behave like an INNER JOIN. – Shawn Sep 11 '17 at 20:25
  • 1
    It clearly doesn't behave according to your preconceptions of an inner join but that doesn't make it not an inner join. An inner join is just all rows from the Cartesian product where the on clause evaluates to true. See http://sqlmag.com/sql-server/logical-query-processing-clause-and-joins – Martin Smith Sep 11 '17 at 20:28
  • An INNER JOIN is a JOIN of two tables where t1.key = t2.key. When the JOIN is on 1=1, it becomes a CROSS JOIN, even though it says INNER JOIN. – Shawn Sep 11 '17 at 20:33
  • 1
    An equi join on key is one type of inner join. It is by no means the only one. – Martin Smith Sep 11 '17 at 20:37
  • To go back to the original question, I think it is very important to understand why a 1=1 Cartesian product in any JOIN type is a bad thing, but I don't think it adequately addresses an understanding of the different basic JOIN types to say that all JOINs can be t1*t2 rows. INNER, LEFT and FULL are all JOINs with an expectation that t1.key = t2.key. Otherwise they're simply always just a filtered CROSS JOIN. While technically correct, that doesn't really speak to a comprehension of the types of JOINs that the OP was asking about. – Shawn Sep 11 '17 at 20:48
  • @MartinSmith So does this mean... taking `LEFT/RIGHT JOIN` for example - if one of the tables are empty, `LEFT JOIN` MINIMUM or MAXIMUM is the size of Table A, and if it is `RIGHT JOIN`, it is size of Table B. But if one of the tables are not empty, then `LEFT/RIGHT JOIN` MINIMUM is size of respective outer table. MAXIMUM is (if not empty) A x B - ROWS that do not match `ON` or `WHERE`? – Simon Sep 11 '17 at 20:50
  • @Simon In the case of a JOIN, the WHERE clause doesn't really come in to play. It affects the results of the query, but not the JOIN. I'm also of the opinion that a RIGHT JOIN is simply a LEFT JOIN written backwards. I don't really like them and usually see them as a case to rethink my query. They can make things easier, but my English-thinking brain just likes seeing left-to-right organization. To your question, http://sqlfiddle.com/#!6/c6a10/1. – Shawn Sep 11 '17 at 21:47
  • Given the labels, the circles do not represent table 1 & table 2. Suggest you find out what they do represent. Right now they are nonsense. The rest of your content is also unclear & unjustified. – philipxy Sep 12 '17 at 05:01
  • @philipxy I'm not sure how those circles don't represent the types of JOINs they're listed under. Those are pretty much identical to every image representation of those JOIN types that you'll find on the internet. They are fairly generic SQL. Can you please explain how they don't make sense or how any of what I posted is unclear? I will happily join a chat and fix my answer if what I answered is incorrect. We're all here to learn, and I'm plenty open to being corrected. I've looked over my answer multiple times, and I don't see the clarity problem. Maybe I simply expressed myself poorly. – Shawn Sep 12 '17 at 13:27
  • What are the circle partions enclosing? Not rows of those tables. Oh yes, the Venn diagrams appear a lot but they are mostly misused and/or mislabelled. They do illustrate interection vs union vs except/minus and they illustrate inner vs outer joins. But in the latter case, they do not enclose rows of the input tables. Do you know what outer joins do? See my many comments at https://stackoverflow.com/q/38549/3404097 including on the many poor answers blindly upvoted. Re your answer, it does not explain itself and its also wrong, which, if you justified your answer (correctly), you would see. – philipxy Sep 13 '17 at 01:24
  • @MartinSmith Defining max & min in terms of input row counts, there are some interesting special cases for full join. – philipxy Sep 13 '17 at 01:36
  • By the way, you seem think that `inner join on` *isn't* just a filtered `cross join`; [but it is](https://stackoverflow.com/a/25957600/3404097). Indeed the standard *defines* `inner join on` as `cross join where`. Anyway, your notion of what a join is (a) should be explained in your answer and (b) is not the only reasonable notion & (c) is rather naively limited to a special case. [Is there any rule of thumb to construct SQL query from a human-readable description?](https://stackoverflow.com/a/33952141/3404097) – philipxy Sep 13 '17 at 02:15
  • @philipxy I'm not sure using your own arguments to support your argument is the most effective method of proving your point, especially when others disagree. Regardless, as others pointed out, Pinal Dave even made a similar description of basic JOINs (https://blog.sqlauthority.com/2009/04/13/sql-server-introduction-to-joins-basic-of-joins/). I do find Pinal to be a fair authority on SQL to appeal to, so I kinda agree with his points. I think you make fair points, but you dig way too far into the inner workings of the optimizer for a beginner to grok well. I do understand that ... – Shawn Sep 13 '17 at 14:15
  • ... INNER JOIN ON 1=1 is essentially a CROSS JOIN (I do apologize for misspeaking earlier and mixing a FULL and CROSS JOIN), but I think that is getting _WAY_ nit-picky in the definition of JOINs, and again, well beyond the OP's question. While technicalities can change JOIN results (and should be understood), I do think that _intent_ plays a lot into the common definition of JOINs. And I think those diagrams do a fair job of describing those common definitions. If you have a different way of viewing how JOINs work, I'd be happy to join a chat to hear it. – Shawn Sep 13 '17 at 14:15
  • I have added additional clarification to my definitions of the JOIN types, and I've also added 2 examples of the results of the different JOINs. – Shawn Sep 13 '17 at 15:43
  • I am not arguing, I am explaining. You have only to work things through instead of assuming things. Think for yourself. Eg work out what the sets in the Venn diagrams are sets of; since they are not input rows, your use of them here says nothing about the number of output rows in terms of input rows. Eg since joins (whatever you want to define them as) do not in general output input rows, your statement "FULL OUTER JOIN includes all rows from both tables" is meaningless until you explain "includes" and what it has to do with your answer. You're just parroting others' parroting. Good luck. – philipxy Sep 14 '17 at 01:26
  • 1
    PS My comments have nothing to do with optimization, they are about the defintion of what sql outputs as a function of inputs. It's clear from your writing about join & about optimization that you have many misconceptions. (And if you read my comments you'll see Pinal writes that he didn't understand cross join was a special case of inner join on.) – philipxy Sep 14 '17 at 01:30
  • @philipxy You and I will just have to agree to disagree. You have read more into my comments than what was there. And apparently Pinal's too. Him saying he never saw an `INNER JOIN ON 1=1` as the same as a `CROSS JOIN` doesn't invalidate the other things he's said. And again, he's kind of a SQL expert. I have thought through my explanation, and written it out more. If you see things differently, I told you, I'd be happy to share a chat with you to hear your explanation. But if you want to be insulting and say I don't think for myself, this isn't really the appropriate forum. Thanks. – Shawn Sep 14 '17 at 14:33
  • @Shawn Thanks for the nice answer, this is just what I was looking for to double-check my own logic. Seems solid to me, at least I couldn't spot any errors. Of course this is for joins on column values, not on something like a join on `True`. But that is the standard case. – doublefelix Nov 18 '21 at 18:06