4

I have the following tables:

TABLE product
id int(11)
title varchar(400)

TABLE tag
id int(11)
text varchar(100)

TABLE product_tag_map
product_id int(11)
tag_id int(11)

PRODUCT_TAG_MAP maps tags to product. The distribution of tags in the system isn't normal, i.e., some tags have much more products than others.

I'm trying to write an SQL that will fetch 25 random products: 5 products per tag, for 5 tags (so that's 5x5 = 25).

Found an answer here: How can I get an even distribution using WHERE id IN(1,2,3,4)

But this doesn't yield random products - it always fetches the same products per tag.

Here is the SQL I have:

SET @last_tag = 0;
SET @count_tag = 0;

SELECT DISTINCT id FROM (
SELECT
  product.*,
  @count_tag := IF(@last_tag = product_tag_map.tag_id, @count_tag, 0) + 1 AS tag_row_number,
  @last_tag := product_tag_map.tag_id
FROM product
  LEFT JOIN product_tag_map ON (product_tag_map.product_id=product.id)
WHERE
  product_tag_map.tag_id IN (245,255,259,281,296)
) AS subquery WHERE tag_row_number <= 5;

How do I make it return random products per tag?

Any help would be much appreciated! Thanks.

Community
  • 1
  • 1
Moti Radomski
  • 275
  • 2
  • 8

2 Answers2

2

There is a lot of tricks in this query :

  1. Add a level of nesting for use a LIMIT in a subquery : mySQL subquery limit
  2. Add a row_number functionality for MySQL : How to select the first/least/max row per group in SQL

The final result is a lot of subquery:

SELECT tag.Name, t0.Id as MapId
FROM
(
    SELECT * 
         , @num := if(@type = tag_id, @num + 1, 1) as row_number
         , @type := tag_id as dummy
    FROM (
        SELECT *
        FROM map m
        WHERE tag_id in
        (
            SELECT *
            FROM
            (
                SELECT id
                FROM tag
                ORDER BY RAND() LIMIT 5
            ) t
        )
     ORDER BY tag_id, RAND()  
  ) mainTable
  , (SELECT @num:=0) foo
  , (SELECT @type:=0) foo2
) t0 
    INNER JOIN tag
        ON t0.tag_id = tag.id
WHERE row_number <= 5

SQL Fiddle

The idea is to select first 5 random tags. This is not difficult, just a simple ORDER BY RAND() LIMIT 5.

Then the tricky part is too simulate a ROW_NUMBER() OVER(PARTITION BY tag_id, RAND()), because ranking each item randomly, but partition by tag is exactly what you need. So you declare variable and do as the query show.

Finally, filter the row_number, and you have your 25 random items!

Community
  • 1
  • 1
Cyril Gandon
  • 16,830
  • 14
  • 78
  • 122
  • Thank you!! That did the trick. I added an INNER JOIN to content_item so I can return the content_item.id fields (that's what I'm after). PERFECT! – Moti Radomski Apr 25 '13 at 15:56
0

I also want to offer the "brute" force approach. This will work in most databases (although the rand() function may be named something else).

select content_item_id from content_item where tag_id = 245 order by RAND() limit 5
union all
select content_item_id from content_item where tag_id = 255 order by RAND() limit 5
union all
select content_item_id from content_item where tag_id = 259 order by RAND() limit 5
union all
select content_item_id from content_item where tag_id = 281 order by RAND() limit 5
union all
select content_item_id from content_item where tag_id = 206 order by RAND() limit 5

The performance for this might be ok, if you have an index on content_item(tag_id).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks. My original solution included unions, but that was very heavy performance-wise. – Moti Radomski Apr 25 '13 at 15:57
  • @MotiRadomski . . . The performance might be ok if you have the right indexes. Otherwise, you need to do a separate table scan for each subquery, killing performance. – Gordon Linoff Apr 25 '13 at 16:03