2

I have two tables. Global fishing data (capture & aquaculture). I want to combine by emulating FULL OUTER JOIN in MySQL.

Tbl_A - capture

  • capture_id
  • year_c
  • species_c
  • iso_code_c
  • area_code_c
  • environ_code_c
  • qty_taken
  • value_c
  • symbol_c

Tbl_B - aquaculture

  • aqua_id
  • year_a
  • species_a
  • iso_code_a
  • area_code_a
  • environ_code_a
  • qty_prod
  • value_aqua
  • symbol_a

This is complicated by:

  • Tbl_A and Tbl_B are not related
  • Tbl_B may not have a corresponding row

Tbl_A - capture

cap_id| yr_c| sp_c| iso_c| area_c| qty_c
     3| 2015|  TRR|    54|      8|   120
   678| 2015|  BOM|    62|     27|   0.0
    20| 2015|  TRR|    54|     27|   0.0
    45| 2015|  FRC|     7|     15| 86800

Tbl_B - aquaculture

cap_id| yr_a| sp_a| iso_a| area_a| qty_a
    78| 2015|  OTS|    32|     27|  6868
   333| 2015|  FRC|     7|     15|   550
   789| 2015|  TRR|    54|     27| 45000
   987| 2015|  TRR|    32|     27|    40

For a selected year (2015) I am trying to capture:

  1. Tbl_A & Tbl_B records where (species, iso_code, area) are the same;
  2. Tbl_A records (species, iso_code, area) that don't have a Tbl_B match; and
  3. Tbl_B records (species, iso_code, area) that don't have a Tbl_A match.

Tbl_C - Desired Final Tbl

id| cap_id| aqua_id| yr_c| yr_a| sp_c| sp_a| iso_c| iso_a|area_c|area_a| qty_c| qty_a 

1 |     20|     789| 2015| 2015|  TRR|  TRR|    54|    54|    27|    27|   0.0| 45000
2 |     45|     333| 2015| 2015|  FRC|  FRC|     7|     7|    15|    15| 86800| 550
3 |    678|    NULL| 2015| NULL|  BOM| NULL|    62|  NULL|    27|  NULL|   0.0| NULL
4 |      3|    NULL| 2015| NULL|  TRR| NULL|    54|  NULL|     8|  NULL|   120| NULL
5 |   NULL|      78| NULL| 2015| NULL|  OTS|  NULL|    32|  NULL|    27|  NULL| 6868    
6 |   NULL|     987| NULL| 2015| NULL|  TRR|  NULL|    32|  NULL|    27|  NULL| 40

I have a query that uses a UNION to JOIN two LEFT JOINS:

(SELECT 
c.capture_id,
a.aqua_id,
c.year_c,
a.year_a,
c.species_c,
a.species_a,
c.iso_code_c,
a.iso_code_a,
c.area_c,
a.area_a,
c.environ_code_c,
a.environ_code_a,
c.qty_taken,
a.qty_prod
FROM capture AS c 
LEFT JOIN aquaculture AS a 
ON c.year_c = a.year_a AND c.iso_code_c = a.iso_code_a AND c.area_c = a.area_a AND c.species_c = 
a.species_a
WHERE c.year_c = 2015 AND a.year_a = 2015)
UNION
(SELECT 
c.capture_id,
a.aqua_id,
c.year_c,
a.year_a,
c.species_c,
a.species_a,
c.iso_code_c,
a.iso_code_a,
c.area_c,
a.area_a,
c.qty_taken,
a.qty_prod
FROM  aquaculture AS a
LEFT JOIN capture AS c 
ON c.year_c = a.year_a AND c.iso_code_c = a.iso_code_a AND c.area_c = a.area_a AND c.species_c = a.species_a
WHERE a.year_a = 2015 AND c.year_c = 2015);

But the query above is only returning a small sub-set of matched records

id| cap_id| aqua_id| yr_c| yr_a| sp_c| sp_a| iso_c| iso_a|area_c|area_a| qty_c| qty_a 

1 |     20|     789| 2015| 2015|  TRR|  TRR|    54|    54|    27|    27|   0.0| 45000
2 |     45|     333| 2015| 2015|  FRC|  FRC|     7|     7|    15|    15| 86800|   550

I do not understand how I am cancelling out the effect of the LEFT JOINS

  • It's hard to tell, but are you trying to do a `FULL OUTER JOIN`, where the result contains a row if either of the input tables contains that row? – Barmar Sep 06 '17 at 22:26
  • If that's what you're trying to do, see https://stackoverflow.com/questions/4796872/full-outer-join-in-mysql – Barmar Sep 06 '17 at 22:27
  • I find that this tutorial here is of great help: https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/ – JeanPaul98 Sep 06 '17 at 22:58
  • Ug, I hate it when my OCD kicks in. Those squiggly tables! – Strawberry Sep 06 '17 at 23:02
  • `join_id = 10` is wrong, it's missing a column. Should it be `a_id = NULL`? – Barmar Sep 07 '17 at 00:18
  • You are not clear. You have not explained when a row goes in the result you want. Please read & act on [mcve]. Also please format those tables correctly. – philipxy Sep 07 '17 at 09:13
  • What do you mean, prevent? If you don't want to do something, don't do it. Left join returns inner join rows plus left table rows extended by nulls. Now--what rows do you want? You don't need to know constraints to query; they do not complicate querying. You just need to konw when a row goes in each base table, and when it goes in a query result in terms of when it goes in base tables. Read [this](https://stackoverflow.com/a/33952141/3404097) & start by giving base table predicates. It is *good* to have example of inputs & desired output (not too complex)--why did you remove it? Was it ok? – philipxy Sep 08 '17 at 05:27
  • What do you mean, "I have a working query" but "I know the ON and/or the WHERE clause are not correct"?? Please use enough words to say what you mean & describe everything correctly. – philipxy Sep 08 '17 at 07:09
  • Re "cancelling out the effect of the LEFT JOINS" I already told you, look at what left join returns & decide what rows you want vs what rows you are getting from left join & then filtering. Read my comment about what left join does. Then look at your wheres. They require that there are no nulls for c.X. So you are removing null-extended rows. Solve your problem for inner join, then for outer join. This is part of finding a [mcve], which you have still not done. – philipxy Sep 10 '17 at 07:31
  • Possible duplicate of [left join turns into inner join](https://stackoverflow.com/questions/3256304/left-join-turns-into-inner-join) – philipxy Mar 14 '18 at 04:04

2 Answers2

3

As MySQL supports only left, right and inner joins - you need to write two queries: 1) table A left join table B 2) table B left join table A. And then combine them into one result with help of UNION. Union will also remove duplicated rows in the result, so you will only have 3 types of rows in result set:

  1. Row with data from both tables, i e A inner join B
  2. Row with data from table A, where there are no matches from B. i e A left join B
  3. Row with data from table B, where there are no matches in A, i e B left join A

Try:

SELECT ...
FROM c
LEFT JOIN a ON ...
UNION
SELECT ...
FROM a
LEFT JOIN c ON ...
Maksym Moskvychev
  • 1,471
  • 8
  • 11
  • This won't create line 10 in the result, which has both `c_id` and `a_id`. He really wants a full outer join. – Barmar Sep 07 '17 at 00:15
  • Oops, that line is just wrong, it's missing a column. – Barmar Sep 07 '17 at 00:17
  • not sure if a full outer join is it? Yes typo on line 10 (doh!) – sq_peg_rnd_hole Sep 07 '17 at 01:05
  • not sure if a full outer join is it? When I tried full join initially query runs but stopped when no row match in T2. That is why I tried left outer join. Because need to load T1 data even if T2 done? c_id and a_id can be in same row. One table is captured fish, the other is aquaculture (production). Where the year, iso_code, area and species are the same I need them in the same row so I can sum the total global quantity. But where only caught or only aquaculture production for a given combination of iso_code, area and species for that year I need that to load as well. – sq_peg_rnd_hole Sep 07 '17 at 01:14
  • @maksym thanks but union all stacks the data vertically. I need it to be horizontal. I think Barmar is right in that it is a join I need. Just can't seem to get the structure of the query right... – sq_peg_rnd_hole Sep 07 '17 at 01:37
  • @Possbil, yes, union stacks all vertically. In your example "I need to combine the data into one table like this", you expect exactly this, isn't it? So in every row you have either c_id either a_id, but not both together. If it's not what you want - please update original question :) – Maksym Moskvychev Sep 07 '17 at 07:06
  • I've updated my answer according to your needs, please tell me if you still have a question! – Maksym Moskvychev Sep 07 '17 at 07:17
  • @maksym thanks! your suggestion gets me close. Have realised issue is with ON statement. – sq_peg_rnd_hole Sep 07 '17 at 17:28
  • Right! Just in your case I would use "union" , not "union all", to avoid duplicated rows in result – Maksym Moskvychev Sep 07 '17 at 17:37
  • I ran: (SELECT c.cap_id, a.aqua_id, c.yr_c, a.yr_a, c.sp_c, a.sp_a, c.iso_c, a.iso_a, c.area_c, a.area_a, c.qty_taken, a.qty_prod FROM fao_cap AS c LEFT JOIN fao_aqua AS a ON c.iso_c = a.iso_a AND c.area_c = a.area_a AND c.sp_c = a.sp_a WHERE c.yr_c = 2015 AND a.yr_a = 2015) UNION ALL (SELECT c.cape_id, a.aqua_id, c.yr_c, a.yr_a, c.sp_c, a.sp_a, c.iso_c, a.iso_a, c.area_c, a.area_a, c.qty_taken, a.qty_prod FROM fao_aqua AS a LEFT JOIN fao_cap AS c ON c.iso_c = a.iso_a AND c.area_c = a.area_a AND c.sp_c = a.sp_a WHERE c.yr_c = 2015 AND a.yr_a = 2015); – sq_peg_rnd_hole Sep 07 '17 at 17:43
  • what I don't know how to do is set the ON or WHERE statement to allow for records in either table to be listed where there is no match with the constrants – sq_peg_rnd_hole Sep 07 '17 at 17:45
  • so always need 2015 . always need where exact match for 2015 in both tables But also need where 2015 record in capture table only AND where 2015 record in aquaculture table only – sq_peg_rnd_hole Sep 07 '17 at 17:47
  • I also looked at https://stackoverflow.com/questions/4796872/full-outer-join-in-mysql. @shA.t suggested query and expected result is what I'm looking for. But can't seem to find a SO example where also using multiple ON constraints. The WHERE clause is because data tables large (join times out unless I limit query to a single year) – sq_peg_rnd_hole Sep 07 '17 at 17:57
1

Your problem is in the WHERE clauses. You are excluding the records where the 'RIGHT' table record is null.

SELECT *
FROM c LEFT JOIN a ON ...
WHERE c.year = 2015 /*remove AND a.year = 2015*/
UNION SELECT *
FROM a LEFT JOIN c ON ...
WHERE a.year = 2015 /*remove AND c.year = 2015*/
zambonee
  • 1,599
  • 11
  • 17
  • This answer does not explain why they are excluding & does not tie that to the code. It is not clear. – philipxy Sep 10 '17 at 07:34
  • @zambonee and (@philpxy) many thanks. The WHERE clause changes worked. I have to remove 42 duplicates from the capture table, but given 900,000 rows 42 is pretty clean. – sq_peg_rnd_hole Sep 12 '17 at 04:36
  • I now better understand how 2nd year = xxxx was generating same effect as INNER JOIN. Not yet entirely clear on the 2nd LEFT JOIN logic. Will work on tomorrow. – sq_peg_rnd_hole Sep 12 '17 at 04:39