2

Here is the case I have two tables tags and customers as the following structure

Tags Table
ID Name   
1  Tag1
2  Tag2

Customers Table
ID Tag_ID Name
1  1      C1
2  2      C2
3  1      C3

I want a SQL statement to get the first 10 customers (alphabetically) for each tag? is it possible to be done in one query.

P.S the data in the tables are sample data not the actual data

Strawberry
  • 33,750
  • 13
  • 40
  • 57
Mohammad Abu Musa
  • 1,117
  • 2
  • 10
  • 32

3 Answers3

3

Consider the following:

DROP TABLE IF EXISTS tags;

CREATE TABLE tags 
(tag_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY 
,name VARCHAR(12) NOT NULL
);

INSERT INTO tags VALUES
(1,'One'),
(2,'Two'),
(3,'Three'),
(4,'Four'),
(5,'Five'),
(6,'Six');

DROP TABLE IF EXISTS customers;

CREATE TABLE customers  
(customer_id INT NOT NULL
,customer VARCHAR(12)
);

INSERT INTO customers VALUES
(1,'Dave'),
(2,'Ben'),
(3,'Charlie'),
(4,'Michael'),
(5,'Steve'),
(6,'Clive'),
(7,'Alice'),
(8,'Ken'),
(9,'Petra');

DROP TABLE IF EXISTS customer_tag;

CREATE TABLE customer_tag
(customer_id INT NOT NULL
,tag_ID INT NOT NULL
,PRIMARY KEY(customer_id,tag_id)
);

INSERT INTO customer_tag VALUES
(1,1),
(1,2),
(1,4),
(2,3),
(2,2),
(3,1),
(4,4),
(4,2),
(5,2),
(5,5),
(5,6),
(6,6);

The following query returns all customers associated with each tag, and their respective 'rank' when sorted alphabetically...

SELECT t.*, c1.*, COUNT(ct2.tag_id) rank
  FROM tags t
  JOIN customer_tag ct1 
    ON ct1.tag_id = t.tag_id
  JOIN customers c1 
    ON c1.customer_id = ct1.customer_id 
  JOIN customer_tag ct2 
    ON ct2.tag_id = ct1.tag_id 
  JOIN customers c2 
    ON c2.customer_id = ct2.customer_id 
   AND c2.customer <= c1.customer 
 GROUP 
    BY t.tag_id, c1.customer_id
 ORDER 
    BY t.tag_id,rank;
+--------+-------+-------------+----------+------+
| tag_id | name  | customer_id | customer | rank |
+--------+-------+-------------+----------+------+
|      1 | One   |           3 | Charlie  |    1 |
|      1 | One   |           1 | Dave     |    2 |
|      2 | Two   |           2 | Ben      |    1 |
|      2 | Two   |           1 | Dave     |    2 |
|      2 | Two   |           4 | Michael  |    3 |
|      2 | Two   |           5 | Steve    |    4 |
|      3 | Three |           2 | Ben      |    1 |
|      4 | Four  |           1 | Dave     |    1 |
|      4 | Four  |           4 | Michael  |    2 |
|      5 | Five  |           5 | Steve    |    1 |
|      6 | Six   |           6 | Clive    |    1 |
|      6 | Six   |           5 | Steve    |    2 |
+--------+-------+-------------+----------+------+

If we just want the top 2, say, for each tag, we can rewrite that as follows...

SELECT t.*  
     , c1.*
  FROM tags t
  JOIN customer_tag ct1 
    ON ct1.tag_id = t.tag_id
  JOIN customers c1 
    ON c1.customer_id = ct1.customer_id 
  JOIN customer_tag ct2 
    ON ct2.tag_id = ct1.tag_id 
  JOIN customers c2 
    ON c2.customer_id = ct2.customer_id 
   AND c2.customer <= c1.customer 
 GROUP 
    BY t.tag_id, c1.customer_id
HAVING COUNT(ct2.tag_id) <=2
 ORDER 
   BY t.tag_id, c1.customer;
+--------+-------+-------------+----------+
| tag_id | name  | customer_id | customer |
+--------+-------+-------------+----------+
|      1 | One   |           3 | Charlie  |
|      1 | One   |           1 | Dave     |
|      2 | Two   |           2 | Ben      |
|      2 | Two   |           1 | Dave     |
|      3 | Three |           2 | Ben      |
|      4 | Four  |           1 | Dave     |
|      4 | Four  |           4 | Michael  |
|      5 | Five  |           5 | Steve    |
|      6 | Six   |           6 | Clive    |
|      6 | Six   |           5 | Steve    |
+--------+-------+-------------+----------+

This is fine, but where performance is an issue, a solution like the following will be faster - although you may need to run SET NAMES utf8; prior to constructing the tables (as I had to) in order for it to work properly:

SELECT tag_id, name, customer_id,customer 
  FROM
     (
       SELECT t.*
            , c.*
            , CASE WHEN @prev=t.tag_id THEN @i:=@i+1 ELSE @i:=1 END rank
            , @prev := t.tag_id
         FROM tags t
         JOIN customer_tag ct
           ON ct.tag_id = t.tag_id
         JOIN customers c
           ON c.customer_id = ct.customer_id
         JOIN ( SELECT @i:=1, @prev:=0) vars
        ORDER
           BY t.tag_id
            , c.customer
     ) x
 WHERE rank <=2
 ORDER 
    BY tag_id,customer;
+--------+-------+-------------+----------+
| tag_id | name  | customer_id | customer |
+--------+-------+-------------+----------+
|      1 | One   |           3 | Charlie  |
|      1 | One   |           1 | Dave     |
|      2 | Two   |           2 | Ben      |
|      2 | Two   |           1 | Dave     |
|      3 | Three |           2 | Ben      |
|      4 | Four  |           1 | Dave     |
|      4 | Four  |           4 | Michael  |
|      5 | Five  |           5 | Steve    |
|      6 | Six   |           6 | Clive    |
|      6 | Six   |           5 | Steve    |
+--------+-------+-------------+----------+
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • Thanks for the answer, this is exactly what I am looking for. performance is not an issue at the moment as I am using this structure for exploring a method – Mohammad Abu Musa Nov 21 '15 at 12:46
1

To achieve this, we have to use two session variables, one for the row number and the other for storing the old customer ID to compare it with the current one as the following query:

select c.name, @row_number:=CASE
    WHEN @cid = c.id THEN @row_number + 1
    ELSE 1
END AS rows,
@id:=c.id as CustomerId from tags t, customers c where t.id=c.id group by c.name where Rows<=10

We used CASE statement in the query. If the customer number remains the same, we increase the row_number variable

Reference

Muhammad Muazzam
  • 2,810
  • 6
  • 33
  • 62
  • I really appreciate the answer it is very close to what I am looking for @Strawberry, managed to solve this issue. I would love if you can explain your solution as I think it will be beneficial in the future. Strawberry said that his solution has performance issue maybe we can incorporate your solution to have one perfect solution – Mohammad Abu Musa Nov 21 '15 at 12:48
  • @Strawberry solution is not easy so I can mark him a credit. – Muhammad Muazzam Nov 21 '15 at 12:55
0

Your question reminds me of this one (see especially the top-voted answer), so I came up with this query:

SELECT Tags.ID,
       Tags.Name,
       SUBSTRING_INDEX(GROUP_CONCAT(Customers.Name
                                    ORDER BY Customers.Name),
                       ',', 10) AS Customers
FROM Customers
INNER JOIN Tags
ON Tags.ID = Customers.Tag_ID
GROUP BY Tags.ID
ORDER BY Tags.Id;

It works, but this is clearly a hacky way to do this, because MySQL does not offer yet tools to do this more naturally.

Community
  • 1
  • 1
julienc
  • 19,087
  • 17
  • 82
  • 82