1

First of all thank you for taking time to read this. I'm trying to answer a few questions about mysql and I'm stuck on a fews and that's why I resolve to ask for somebody help if possible. Or at least to point me in the right direction. Any help will be very appreciated.

City table contains population data for 4000 cities across 230 countries. Explain why the following query performs poorly in MySQL 5.6, and demonstrate solutions.

SELECT `ID`, `CountryCode`, `Name`, `District`, `Population`
FROM `City`
WHERE (`CountryCode`, `Population`) IN (
    SELECT `CountryCode`, MAX(`Population`) AS `Population`
    FROM `City`
    GROUP BY `CountryCode`
);
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828

1 Answers1

1

You would think this query only executes the subquery once, keeps the result, and then compares it to rows in the outer query. But that's not the case with MySQL. MySQL has a gap in the intelligence of its optimizer, so it treats the subquery as a dependent subquery and re-executes it for each distinct value of the outer query.

To fix this, move the subquery into the FROM clause as a derived table. It will execute the subquery once, and keep the result as an internal temporary table. Then join to the other instance of the table.

SELECT `ID`, `CountryCode`, `Name`, `District`, `Population`
FROM `City`
JOIN (
    SELECT `CountryCode`, MAX(`Population`) AS `Population`
    FROM `City`
    GROUP BY `CountryCode`
) AS _max USING (`CountryCode`, `Population`);

Also you should have an index on City over the two columns (CountryCode,Population) in that order, so the GROUP BY query can run efficiently.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828