Quicker way to get this data out of mySQL

Question

I currently am doing this to get some data from our table:

SELECT DISTINCT(CategoryID),Distance FROM glinks_DistancesForTowns WHERE LinkID = $linkID ORDER BY Distance LIMIT 20

I'm iterating over that for every link id we have (50k odd). Them I'm processing them in Perl with:

my @cats;
while (my ($catid,$distance) = $sth->fetchrow) {
    push @cats, $cat;
}

I'm trying to see if there is a better way to do this in a sub-query with MySQL, vs doing 50k smaller queries (i.e one per link)

The basic structure of the table is:

glinks_Links
    ID
glinks_DistancesForTowns 
    LinkID
    CategoryID
    Distance

I'm sure there must be a simple way to do it - but I'm just not seeing it.

As requested - here is a dump of the table structure. Its actually more complex than that, but the other fields just hold values so I've taken those bits out to give a cleaner over-view of the structure:

CREATE TABLE `glinks_DistancesForTowns` (
 `LinkID` int(11) DEFAULT NULL,
 `CategoryID` int(11) DEFAULT NULL,
 `Distance` float DEFAULT NULL,
 `isPaid` int(11) DEFAULT NULL,
 KEY `LinkID` (`LinkID`),
 KEY `CategoryID` (`CategoryID`,`isPaid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1


CREATE TABLE `glinks_Links` (
 `ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `Title` varchar(100) NOT NULL DEFAULT '',
 `URL` varchar(255) NOT NULL DEFAULT 'http://',
 PRIMARY KEY (`ID`),
 KEY `booking_hotel_id_fk` (`booking_hotel_id_fk`)
) ENGINE=MyISAM AUTO_INCREMENT=617547 DEFAULT CHARSET=latin1

This is the kind of thing I'm hoping for:

SELECT glinks_Links.ID FROM glinks_Links as links, glinks_DistancesForTowns as distance (
        SELECT DISTINCT(CategoryID),Distance FROM distance WHERE distance.LinkID = links.ID ORDER BY Distance LIMIT 20
    )

But obviously that doesn't work;)

Yes. But it performs not the action you think about. `xxx` is calculated. `(xxx)` is an expression which' value is `xxx`, Then an output row `ROW(xxx, yyy)` is formed. When all rows are formed then DISTINCT is applied to the rowset (the whole rows which are full duplicates are collapsed). — Akina, Nov 23 '20 at 16:41
`DISTINCT(CategoryID)` - CategoryID is the column name, so all its selling it to do is grab the distinct category ids for that match. This part of the query works fine. My main problem is how I can join it into a one liner :) — Andrew Newby, Nov 23 '20 at 16:44
*so all its selling it to do is grab the distinct category ids for that match* Read my comments one more time. Think about them. Previously - assume that your current knowledge about DISTINCT is incorrect. — Akina, Nov 23 '20 at 16:46
If it were a function, then you could also do `SELECT Distance, DISTINCT(CategoryID)...` but that doesn't work, it returns an error. — Bill Karwin, Nov 23 '20 at 16:52
When asking for query-optimization help, please post the output of `SHOW CREATE TABLE ` so we don't have to guess at what indexes and data types you currently have. Also post the result of `EXPLAIN SELECT ...` for your query so we see how the optimizer is currently choosing indexes. Help us to help you! — Bill Karwin, Nov 23 '20 at 16:54
@BillKarwin I've updated the opening question with the table structure. I think the query itself is fine: `Showing rows 0 - 19 (20 total, Query took 0.0023 seconds.)` , the issue is with it iterating over all the links. What would be faster is just a sub-query that somehow puts the 2 together :) — Andrew Newby, Nov 23 '20 at 17:03
This thread about use of `DISTINCT` is off-topic for your actual question, but here's an old answer of mine that explains it: https://stackoverflow.com/a/7250654/20860 — Bill Karwin, Nov 23 '20 at 17:36
`SELECT DISTINCT(CategoryID),Distance FROM ...` means the same as `SELECT DISTINCT CategoryID,Distance FROM ...` _`DISTINCT` is not a function!_ That is, both de-dup on the pair of columns. — Rick James, Nov 28 '20 at 00:51
@RickJames so how come you can do : `SELECT distinct(field_name) FROM tables name where foo = "bar"`, and it will return only single values instead of duplicated? — Andrew Newby, Nov 28 '20 at 06:25
@RickJames - simple example here: http://sqlfiddle.com/#!9/b75172/1 — Andrew Newby, Nov 28 '20 at 06:53
@AndrewNewby - Add these to your sqlfiddle: `SELECT distinct(foo), bar FROM `test`; SELECT distinct foo, bar FROM `test`;` — Rick James, Nov 30 '20 at 02:16
@RickJames I'm not sure what that is supposed to show me that is different? Both of those give the same result, but the 2nd example with `distinct foo` is 1ms quicker? — Andrew Newby, Nov 30 '20 at 06:55

score 1 · Accepted Answer · answered Nov 23 '20 at 17:35

It sounds like you want the top 20 towns by distance for each link, right?

MySQL 8.0 supports window functions, and this would be the way to write the query:

WITH cte AS (
  SELECT l.ID, ROW_NUMBER() OVER(PARTITION BY l.ID ORDER BY d.Distance) AS rownum
  FROM glinks_Links as l
  JOIN glinks_DistancesForTowns AS d ON d.LinkID = l.ID
) SELECT ID FROM cte WHERE rownum <= 20;

Versions older than 8.0 do not support these features of SQL, so you have to get creative with user-defined variables or self-joins. See for example my answer to How to SELECT the newest four items per category?

Quicker way to get this data out of mySQL

1 Answers1