2

I have a problem with this SQL-Query

SELECT * 
FROM page p 
LEFT JOIN categorylinks c ON p.page_id = c.cl_from 
WHERE c.cl_to IN
(
  'Art-Rock-Band',
  'Echo-Pop-Preisträger',
  'Englische_Band',
  'Genesis_(Band)',
  'Grammy-Preisträger',
  'Peter_Gabriel',
  'Phil_Collins',
  'Popband',
  'Progressive-Rock-Band',
  'Rock_and_Roll_Hall_of_Fame'
)

It works and I get a very big result of every page where p.page_id = c.cl_from

Now I want to set a limit for every single category because the query takes too long.

I want just 5 results for 'Art-Rock-Band', just 5 results for 'Echo-Pop-Preisträger' etc...

Elias Nicolas
  • 775
  • 13
  • 26
Name
  • 139
  • 2
  • 9
  • This may be a duplicate of http://stackoverflow.com/questions/176964/select-top-10-records-for-each-category – Greg Viers Sep 24 '15 at 20:19
  • I'm not familiar with mysql analytic functions, but if there is RANK like orace, you can use it to rank the columns in C table partitioned by the cl column and then take only those whos lower then 5 – sagi Sep 24 '15 at 20:25
  • DO you want 'Art-Rock-Band' and 'Echo-Pop-Preisrager' in one query? – mfredy Sep 24 '15 at 20:53
  • @sagi - no windowing functions in mysql – Hogan Sep 24 '15 at 20:57
  • @sagi I think the purpose was to get the top 5 from each category, rather than only showing results whose sets number 5 or less. – CSS Sep 24 '15 at 21:02

4 Answers4

1

The solution to this is somewhat lengthy (unless someone else has a better idea) but you can use UNION ALL to display the top 5 results from a series of smaller queries following this pattern:

SELECT * FROM page p
LEFT JOIN categorylinks c ON p.page_id = c.cl_from 
WHERE c.cl_to = 'Art-Rock-Band'
LIMIT 5
UNION ALL
SELECT * FROM page p
LEFT JOIN categorylinks c ON p.page_id = c.cl_from 
WHERE c.cl_to = 'Echo-Pop-Preisträger'
LIMIT 5
...

You could also get fancy and replace the 5 with a variable so that you can control how many results you get from everything with a simple change:

DECLARE @num INT DEFAULT 5;

SELECT * FROM page p
LEFT JOIN categorylinks c ON p.page_id = c.cl_from 
WHERE c.cl_to = 'Art-Rock-Band'
LIMIT @num
UNION ALL
SELECT * FROM page p
LEFT JOIN categorylinks c ON p.page_id = c.cl_from 
WHERE c.cl_to = 'Echo-Pop-Preisträger'
LIMIT @num
...

As an added value, I put in a second declaration (commented out, of course) of the same variable in the instance you might want to recall by percent rather than a set number.

I hope this helps point you in the right direction at least.

-C§

EDIT: For SQL Server, replace LIMIT @num with TOP @num before the UNION ALL in each query and replace the DEFAULT with =. You can also have a second line to declare the @num as a string and use the PERCENT keyword, but only in SQL Server as neither MySQL nor Oracle supports it.

For Oracle, you can replace it similarly with an addition to the WHERE clause: AND ROWNUM <= @num. You also want to update the DECLARE statement to prepend a colon to the equals so = becomes :=.

This should account for the primary differences from the above MySQL examples for any coming behind that have a similar question in the other two formats. More explanation can be found here: http://www.w3schools.com/sql/sql_top.asp.

CSS
  • 412
  • 5
  • 17
  • Did you noticed that the question is tagged with `MySQL`? Cause there is no `TOP` in `MySQL`. – Rahul Sep 24 '15 at 20:54
  • But isn't it in the UNION ALL case, that i just can search for two categorylinks not for this all in my question? or can i put the other categorys under last where clausle where you put the three points in? – Name Sep 24 '15 at 21:04
  • UNION ALL can give duplicate lines... not sure the user wants that. – Hogan Sep 24 '15 at 21:05
  • @hogan duplicate lines when each query is searching a different category? Surely you do jest, sir. – CSS Sep 24 '15 at 21:07
  • @CSS I don't jest -- category could (and I expect does) have more than one entry per page. – Hogan Sep 24 '15 at 21:08
  • @Hogan I guess we would have to ask the OP for a set of example data and whether he would care about multiple page_id values per category generating separate results. Either way, I don't think this would be likely to generate duplicate rows. At worst, he can add the `DISTINCT` keyword, and likely would need to for both of our responses, in order to filter it down. – CSS Sep 24 '15 at 21:16
  • @CSS - all of this is true... `DISTINCT` always leaves a bad taste in my mouth -- probably just a troubled childhood. – Hogan Sep 24 '15 at 21:20
  • It returns that i have to use the right syntax in 'DECLARE INT @num = 5' – Name Sep 24 '15 at 22:44
  • nethertheless it returns the error for this `DECLARE @num INT = 5;`.. sry :/ – Name Sep 24 '15 at 23:06
  • using SET it works but then he sad use the right syntax for `'@num * FROM page p LEFT JOIN categorylinks c ON p.page_id = c.cl_from WHERE c.c' ` – Name Sep 24 '15 at 23:08
  • @Name I have updated to showcase MySQL for you and have correctly annotated that only SQL Server supports the `PERCENT` keyword. If this finally works for you, please mark it as the correct answer. Thanks. – CSS Sep 24 '15 at 23:23
  • I think it does not work my query looks like this: `DECLARE @num INT = 5; SELECT * FROM page p LEFT JOIN categorylinks c ON p.page_id = c.cl_from WHERE c.cl_to = 'Art-Rock-Band' TOP @num UNION ALL SELECT * FROM page p LEFT JOIN categorylinks c ON p.page_id = c.cl_from WHERE c.cl_to = 'Echo-Pop-Preisträger' TOP @num ` nethertheless the same error with not right syntax for '`DECLARE @num INT = 5`' – Name Sep 25 '15 at 12:29
  • @Name `DECLARE @num DEFAULT 5;` before the query, `LIMIT @num;` after the query. That is the final answer, as depicted in my final edit above. I think you understand to use `UNION ALL` between each pair of queries. I'm sorry it was confusing you before. – CSS Sep 25 '15 at 14:15
0
SELECT  * 
FROM page p
LEFT JOIN categorylinks c ON p.page_id = c.cl_from 
WHERE c.cl_to IN
(
  'Art-Rock-Band'
)
LIMIT 5
UNION ALL
SELECT  * 
FROM page p
LEFT JOIN categorylinks c ON p.page_id = c.cl_from 
WHERE c.cl_to IN
(

  'Echo-Pop-Preisträger'
)
LIMIT 5

Edit: I found this link Get top n records for each group of grouped results and the solution is similiar to @CSS

Community
  • 1
  • 1
mfredy
  • 597
  • 1
  • 8
  • 16
  • This only gives 5 results. User wants 10 -- top 5 of each type. – Hogan Sep 24 '15 at 20:58
  • Also notice it is tagged as MySQL, hence my most recent edit. – CSS Sep 24 '15 at 21:00
  • I noticed the mysql tage and added the LIMIT function @CSS – mfredy Sep 24 '15 at 21:00
  • I added this article and I dont know if you had a look at it @CSS – mfredy Sep 24 '15 at 21:09
  • 1
    I looked at it, but it doesn't support your original answer to *this* question. It supports an answer to a question that asks how to limit the query results total to a certain amount. Still not the 5 results per category that the OP wanted in the first place. I see you've changed it to mimic my answer. I must be right or something. =D – CSS Sep 24 '15 at 21:28
0

This is the fastest way, do a top 5 on each category with a union and use that to select your data. This method will work faster if you have good indexes. You want them on page_id, cl_from, and cl_to.

Also note, I select only the page_id first before doing the union -- this allows a sql server to optimize performance.

SELECT * 
FROM page p 
join 
(
   select p.page_id 
   from page p 
   left join categorylinks c ON p.page_id = c.cl_from 
   where c.cl_to = 'Art-Rock-Band'
   limit 5

   union

   select p.page_id 
   from page p 
   left join categorylinks c ON p.page_id = c.cl_from 
   where c.cl_to = 'Echo-Pop-Preisträger'
   limit 5

   union 

   select p.page_id 
   from page p 
   left join categorylinks c ON p.page_id = c.cl_from 
   where c.cl_to = 'Englische_Band'
   limit 5

   union 

   ...

   select p.page_id 
   from page p 
   left join categorylinks c ON p.page_id = c.cl_from 
   where c.cl_to = 'Rock_and_Roll_Hall_of_Fame'
   limit 5
) sub on sub.page_id = p.page_id
Hogan
  • 69,564
  • 10
  • 76
  • 117
0

Does this work for you:

SELECT *
FROM page p
WHERE p.page_id IN (
    SELECT c.cl_from
    FROM categorylinks
    WHERE c.cl_from = p.page_id
    AND c.cl_to IN (
        'Art-Rock-Band',
        'Echo-Pop-Preisträger',
        'Englische_Band',
        'Genesis_(Band)',
        'Grammy-Preisträger',
        'Peter_Gabriel',
        'Phil_Collins',
        'Popband',
        'Progressive-Rock-Band',
        'Rock_and_Roll_Hall_of_Fame'
    )
    LIMIT 5
);
Adrian Lynch
  • 8,237
  • 2
  • 32
  • 40
  • returns 'This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'' – Name Sep 24 '15 at 21:24
  • Also, if this one were to work, it would only return 5 rows, not the 5 rows per category the OP wanted in the first place. – CSS Sep 24 '15 at 21:30