0

The 1st query returns 27384 rows. The 2nd query returns 142899 rows. Can someone please explain what is happening with the RIGHT JOIN and LEFT JOIN that is causing the output difference?

1st query :

SELECT u.id AS id,
MIN(q.creation_date) AS q_creation_date, 
MIN(a.creation_date) AS a_creation_date 
FROM `bigquery-public-data.stackoverflow.posts_questions`AS q 
FULL JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a 
ON q.owner_user_id = a.owner_user_id 
LEFT JOIN `bigquery-public-data.stackoverflow.users` AS u 
ON q.owner_user_id = u.id
WHERE u.creation_date >= '2019-01-01' 
and u.creation_date < '2019-02-01'
GROUP BY id

2nd query :

SELECT u.id AS id,
MIN(q.creation_date) AS q_creation_date,
MIN(a.creation_date) AS a_creation_date
FROM `bigquery-public-data.stackoverflow.posts_questions` AS q
FULL JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a
ON q.owner_user_id = a.owner_user_id 
RIGHT JOIN `bigquery-public-data.stackoverflow.users` AS u
ON q.owner_user_id = u.id
WHERE u.creation_date >= '2019-01-01' and u.creation_date < '2019-02-01'
GROUP BY id

I expected the result from the 1st query to be 142899 rows but I don't know why the LEFT JOIN returns a massively different result.

MI MA
  • 171
  • 5
  • 1
    Possible duplicate of [What is the difference between "INNER JOIN" and "OUTER JOIN"?](https://stackoverflow.com/questions/38549/what-is-the-difference-between-inner-join-and-outer-join) – Ed Bangga Aug 29 '19 at 00:47
  • X left join Y on c is Y right join X on c. What is your reasoning that the queries would be the same? How is this not just asking what the 2 operators do? What did small representative data show you? Please in code questions give a [mre]--cut & paste & runnable code; example input with desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. That includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) – philipxy Aug 29 '19 at 01:00
  • Learn what LEFT/RIGHT JOIN ON returns: INNER JOIN ON rows UNION ALL unmatched left/right table rows extended by NULLs. Always know what INNER JOIN you want as part of an OUTER JOIN. Why wouldn't your 2 FROMs produce different number of null-extended rows? Then: A WHERE or INNER JOIN ON that requires a right/left [sic] table column to be not NULL after a LEFT/RIGHT JOIN ON removes any rows extended by NULLs, ie leaves only INNER JOIN ON rows, ie "turns OUTER JOIN into INNER JOIN". You have that. – philipxy Aug 29 '19 at 01:03

2 Answers2

0

When you use RIGHT JOIN the table with priority is always the one on the right. Similarly LEFT JOIN prioritizes the table to the left of JOIN. Thus the number of rows is different in that the priority table has the search data and the non-priority table does not have the required combination. More details here.

Ramon
  • 101
  • 1
  • 1
    This uses the unexplained term "priority". Even if it were explained, you still haven't said why it makes a difference in this case, so all this says now is "because of how left & right join work". That is no explanation. – philipxy Aug 29 '19 at 01:11
0

The recordset produced by the 1st query includes ALL records of 'q' AND ALL records of 'a' (and where either table doesn't have data to match, the database will fill those empty cells with nulls) BUT LIMITED TO records where both 'q' and 'u' have a match.

So, in the 1st query, the recordset is basically limited by the rows in 'u'. The query will never return more than the maximum number of rows in 'u'.

The recordset produced by the 2nd query includes ALL records of 'q' AND ALL records of 'a' (and where either table doesn't have data to match, the database will fill those empty cells with nulls) AND ALSO ALL records of 'u' (and where either table doesn't have data to match, the database will fill those empty cells with nulls).

So, the 2nd query may produce a recordset with as many rows as the largest table.

dns
  • 89
  • 8
  • What you wrote was not & is not clear, so please don't criticize me for pointing that out. Now you are using "includes" without explaining what you mean by it--how rows are included. (Which has to do with input rows being subrows of output rows.) You are also still unjustifiably assuming a FK that would limit the result to one output row for each input row. I'm done. – philipxy Aug 31 '19 at 01:39
  • FK = foreign key. – philipxy Aug 31 '19 at 01:44
  • The word "include" and "return" are perfectly acceptable. Please find those keyword in these resources, used exactly as I have. https://www.diffen.com/difference/Inner_Join_vs_Outer_Join ("include") https://support.office.com/en-us/article/LEFT-JOIN-RIGHT-JOIN-Operations-EBB18B36-7976-4C6E-9EA1-C701E9F7F5FB ("include") https://www.techonthenet.com/oracle/joins.php https://stackoverflow.com/questions/10590234/left-outer-join-in-postgres-not-returning-values-for-null ("include" and "return") https://www.w3schools.com/sql/sql_join_left.asp ("return") – dns Aug 31 '19 at 01:49
  • I'm done but I will repeat the very first thing I commented here. "This is not clear. (In a way typical of writing about joins.)" [PS Re abuse of Venn diagrams in unclear writing about joins.](https://stackoverflow.com/a/55642928/3404097) – philipxy Aug 31 '19 at 02:14
  • I'll await @MI-MA's response to learn whether my answer is 'clear' or not. Ultimately, if I've answered their question and help them understand joins better, then I'll know the responses was clear. Thank you for a rousing conversation. – dns Aug 31 '19 at 11:57
  • Thanks for both of your responses, really appreciate all the information given here even if there's a slight disagreement over what is what. – MI MA Sep 01 '19 at 00:12
  • Thank you @Mi-MA, I'm happy to help. – dns Sep 01 '19 at 15:42