0

I have a very complex query where I need to select the top N items per businessType. There are a variable number of businessTypes so UNION won't work in this situation. How can I edit this to select the top 10 items for each businessType ordered by rank?

SELECT data_ID,title,distance,age_rank,age_rank * 10 AS rank,age,businessType
FROM 
(SELECT sd.data_ID,sd.title,p.radius,distance_scale,distance_decay,
    p.distance_unit*DEGREES(ACOS(COS(RADIANS(p.latpoint))*COS(RADIANS(sd.storeLat))*
    COS(RADIANS(p.longpoint - sd.storeLon))+SIN(RADIANS(p.latpoint))*
    SIN(RADIANS(sd.storeLat)))) AS distance,
    EXP(-POWER(GREATEST(DATEDIFF(CURDATE(),d.createDate)-scale,0),2)/
    ((-2*POWER(scale,2))/(2*LOG(decay)))) AS age_rank,sd.businessType,sd.store_ID,
    DATEDIFF(CURDATE(),d.createDate) AS age
    FROM search_data sd
    LEFT JOIN data d
    ON sd.data_ID = d.data_ID
    JOIN (
        SELECT 40.6812509 AS latpoint,-73.9809685 AS longpoint,50.0 AS radius,69.0
        AS distance_unit,5 AS scale,0.5 AS decay, 1 as distance_scale, 0.001 
        AS distance_decay
    ) AS p ON 1=1
    WHERE sd.storeLat
    BETWEEN p.latpoint  - (p.radius / p.distance_unit)
    AND p.latpoint  + (p.radius / p.distance_unit)
    AND sd.storeLon
    BETWEEN p.longpoint - (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint))))
    AND p.longpoint + (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint))))
    AND DATE(d.createDate) >= DATE_ADD(CURDATE(),INTERVAL -90 DAY)
) AS d
WHERE distance <= radius
AND businessType = 1
GROUP BY store_ID
ORDER BY rank DESC;

EDIT: I've tried the suggested method, but it seems to not be working. Here is what I've tried on a simplified method:

SELECT sd.title,sd.businessType
FROM search_data sd INNER JOIN (
    SELECT
    businessType,
    GROUP_CONCAT(data_ID ORDER BY createDate DESC) grouped_deal_ID
    FROM
    search_deals
    GROUP BY store_ID) businessType_max
ON sd.businessType = businessType_max.businessType
AND FIND_IN_SET(sd.data_ID,grouped_deal_ID) <= 10
ORDER BY sd.businessType ASC;

With further testing I was able to get a simple version working with (http://www.erikhaselhofer.com/?p=1793):

SELECT RowNum,businessType,createDate
FROM(
SELECT IF(@businessType=businessType,@ctr:=@ctr+1,@ctr:=1) as RowNum, @businessType:=businessType as businessType,title,createDate
FROM search_data
JOIN (SELECT @ctr:=1) as a
ORDER BY businessType, createDate DESC) as b
WHERE RowNum in (1,2,3) AND businessType IN (1,2,3,4,5,6,7);
user2694306
  • 3,832
  • 10
  • 47
  • 95
  • 4
    This seems very similar to a lot of questions under the tag [greatest-n-per-group] -- have you tried searching the other answers on this site? – Ian Clelland Dec 22 '14 at 20:23
  • possible duplicate of [Using LIMIT within GROUP BY to get N results per group?](http://stackoverflow.com/questions/2129693/using-limit-within-group-by-to-get-n-results-per-group) – Patrick Q Dec 22 '14 at 20:26
  • @IanClelland. Extremely well worded comment constructive, non-confrontational and helpful at the same time. Also educational to the user and others who read this question/comment on site use. I applaud every aspect of it. – xQbert Dec 22 '14 at 20:28
  • The link seems like a nice solution, I just can't quite grasp how to apply it to this particular instance. – user2694306 Dec 22 '14 at 20:41
  • I have searched the other ones, but they tend to pertain to straight-forward queries. The complexity of this query is throwing me off track. – user2694306 Dec 22 '14 at 20:42
  • 1
    Start by getting it working for a simplified version of your query, and then start adding in the "complexity". – Patrick Q Dec 22 '14 at 20:56
  • Does the query you have return rows for each business type already? All you need is to limit it to 10 rows in each group? It's hard to tell how close this is without sample data. – AdamMc331 Dec 22 '14 at 21:06
  • @PatrickQ: I've tried your suggestion, but it doesn't seem to return the correct results. I have updated my question accordingly – user2694306 Dec 23 '14 at 07:58

1 Answers1

0

I have answered a few questions on this topic, and a lot of what I learned I references from this article.

Typically, I try to implement something like the following:

SELECT columns
FROM myTable m
WHERE (
   SELECT COUNT(*)
   FROM myTable mc
   WHERE mc.columnToGroupBy = m.ColumnToGroupBy AND mc.orderColumn >= m.orderColumn
   ) <= numberIWantToGroup

While this is much easier to demonstrate with smaller tables without subqueries, as you can see by this question, it may look something like this in your case:

SELECT myColumns
FROM myTableWithAllItsJoins m1
WHERE(
   SELECT COUNT(*)
   FROM myTableWithAllItsJoins m2
   WHERE m1.businessType = m2.businessType AND m1.rank >= m2.rank
   ) <= 10

This will select the top 10 ranks for each businessType.

Community
  • 1
  • 1
AdamMc331
  • 16,492
  • 10
  • 71
  • 133