1

I want to replace the subquery with a join, if possible.

SELECT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person` 
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`) 
LEFT OUTER JOIN `fftenant_surveyanswer` 
ON fftenant_surveyanswer.surveyquestion_id = 1
AND fftenant_surveyanswer.`surveyresult_id` IN (SELECT y.`surveyresult_id` FROM `fftenant_farmer_surveyresults` y WHERE y.farmer_id = `fftenant_farmer`.`person_ptr_id`)

I tried:

SELECT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`#, T5.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person` 
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`) 
LEFT OUTER JOIN `fftenant_farmer_surveyresults` 
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_farmer_surveyresults`.`farmer_id`) 
LEFT OUTER JOIN `fftenant_surveyanswer` 
ON (`fftenant_farmer_surveyresults`.`surveyresult_id` = `fftenant_surveyanswer`.`surveyresult_id`) 
AND fftenant_surveyanswer.surveyquestion_id = 1 

But that gave me one record per farmer per survey result for that farmer. I only want one record per farmer as returned by the first query.

A join may be faster on most RDBMs, but the real reason I asked this question is I just can't seem to formulate a join to replace the subquery and I want to know if it's even possible.

Eloff
  • 20,828
  • 17
  • 83
  • 112

3 Answers3

3

You could use DISTINCT or GROUP BY, as mvds and Brilliand suggest, but I think it's closer to the query's design intent if you change the last join to an inner-join, but elevating its precedence:

SELECT farmer.person_ptr_id, surveyanswer.text_value
  FROM fftenant_farmer AS farmer
 INNER
  JOIN fftenant_person AS person
    ON person.id = farmer.person_ptr_id
  LEFT
 OUTER
  JOIN
(      fftenant_farmer_surveyresults AS farmer_surveyresults
 INNER
  JOIN fftenant_surveyanswer AS surveyanswer
    ON surveyanswer.surveyresult_id = farmer_surveyresults.surveyresult_id
   AND surveyanswer.surveyquestion_id = 1
)
    ON farmer_surveyresults.farmer_id = farmer.person_ptr_id

Broadly speaking, this will end up giving the same results as the DISTINCT or GROUP BY approach, but in a more principled, less ad hoc way, IMHO.

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • This seems to do the job perfectly, and it's clean. I had no idea you could do that in SQL! – Eloff Jan 17 '12 at 05:18
  • Are you sure? This looks to me like it's equivalent to the original poster's first attempt; that is, it will give one result per farmer per survey result. I believe that it's not actually possible to do what the original poster wanted, without knowing which columns are the key columns; and that using an "in" clause is actually a really good way to write this query. – Dawood ibn Kareem Jan 17 '12 at 05:23
  • @DavidWallace: I really wouldn't say that I'm sure, no; but the proof of the pudding is in the eating, and the OP seems to have eaten it! :-P . . . (Of course, a query can seem fine at first, until it's tested more thoroughly, so who knows?) As for the IN clause: I'm not sure. I still don't have a great grasp on the relationships between the various tables, but it seems like `fftenant_farmer_surveyresults` is the bridge between `fftenant_surveyanswer` and `fftenant_farmer`, so logically, any queries that join the latter two probably *should* join the first as well. – ruakh Jan 17 '12 at 05:44
  • @ruakh - my apologies. I've re-read your answer and Eloff's first attempt. I now understand why they're not quite the same. Thanks for your patience. – Dawood ibn Kareem Jan 17 '12 at 06:23
  • @DavidWallace: There's absolutely no need to apologize, it's quite all right. :-) – ruakh Jan 17 '12 at 13:02
2

Use SELECT DISTINCT or GROUP BY to remove the duplicate entries.

Changing your attempt as little as possible:

SELECT DISTINCT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`#, T5.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person` 
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`) 
LEFT OUTER JOIN `fftenant_farmer_surveyresults` 
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_farmer_surveyresults`.`farmer_id`) 
LEFT OUTER JOIN `fftenant_surveyanswer` 
ON (`fftenant_farmer_surveyresults`.`surveyresult_id` = `fftenant_surveyanswer`.`surveyresult_id`) 
AND fftenant_surveyanswer.surveyquestion_id = 1
Brilliand
  • 13,404
  • 6
  • 46
  • 58
  • Actually that doesn't quite work, because it doesn't necessarily select the duplicated row with the joined value from the survey answer table. – Eloff Jan 17 '12 at 05:16
  • If some duplicate rows are needed, you can always GROUP BY the primary keys of all tables from which duplicates should be retained (instead of using SELECT DISTINCT). Most databases (but not MySQL) will also require you to include all of the columns in the SELECT clause in the GROUP BY. – Brilliand Jan 18 '12 at 07:31
  • Whoops, I didn't pay enough attention to those LEFT JOINs. Getting rid of only the specific duplicates that you want gone would require a WHERE clause to pluck out the null surveyanswers that aren't paired with null surveyresults. But ruakh's answer was better. – Brilliand Jan 18 '12 at 08:20
1

the real reason I asked this question is I just can't seem to formulate a join to replace the subquery and I want to know if it's even possible

Then consider a much simpler example to begin with e.g.

SELECT * 
  FROM T1
 WHERE id IN (SELECT id FROM T2);

This is known as a semi join and if desired may be re-written using (among other possibilities) a JOIN with a SELECT clause to a) project only from the 'outer' table, and b) return only DISTINCT rows:

SELECT DISTINCT T1.* 
  FROM T1
       JOIN T2 USING (id);
Community
  • 1
  • 1
onedaywhen
  • 55,269
  • 12
  • 100
  • 138