Select from MySQL table, ordered by the matching number of categories

Question

I have a table, like this:

id:int | name:String | categories:String

example rows:

1 | "Lorem1" | "A, B, C" 
2 | "Lorem2" | "A, B" 
3 | "Lorem3" | "A, C" 
4 | "Lorem4" | "B"

I also have a form, where you can check the categories which u are intrested in. This should be the guide for the order of the select.

First you get back the rows, that has all the selected categories, then you get which has, less match. (If the row has none of the categories, it won't show up)

If someone for example checks:

A and B, they should get back the rows in this order: Lorem1, Lorem2, Lorem3, Lorem 4
A and C, they should get back the rows in this order: Lorem1, Lorem3, Lorem2

This is what I'm trying to make. I am quite new to programing, and this problem showed up.

I also know, maybe I should make a new table for the connections between the categories, and the objects.

Consider either normalising your schema or not bothering with a relational database — Strawberry, Apr 02 '19 at 19:20

sticky bit · Answer 1 · 2019-04-02T19:31:32.307

You can use find_in_set() to check for a string being in the comma separated list. But you have to replace() the spaces first. Do so for each category selected by the user. Then check if the result of find_in_set() is larger than 0, as 0 means it didn't find anything, otherwise it returns the position in the list, which is larger than 0. Add the results of these comparisons. Since Boolean operations that are true are 1 in numeric context and otherwise 0, you can then order by that sum descending. I.e. the more matches a row has, the earlier it is out putted in the result.

Example for categories 'A' and 'C':

SELECT *
       FROM elbat
       ORDER BY (find_in_set('A', replace(categories, ' ', '')) > 0)
                +
                (find_in_set('C', replace(categories, ' ', '')) > 0)
                DESC;

You can also use this to exclude rows without any match. The sum will be 0 then.

SELECT *
       FROM elbat
       WHERE (find_in_set('A', replace(categories, ' ', '')) > 0)
             +
             (find_in_set('C', replace(categories, ' ', '')) > 0)
             > 0
       ORDER BY (find_in_set('A', replace(categories, ' ', '')) > 0)
                +
                (find_in_set('C', replace(categories, ' ', '')) > 0)
                DESC;

But comma separated lists are a pain. You should consider revising the schema and have another table, that links the items to categories.

Thank you! This works perfectly! I never heard of these functions in mysql. — Patrik Vörös, Apr 02 '19 at 19:54

score 0 · Answer 2 · answered Apr 02 '19 at 19:36

0

Instead of storing your categories as a string you should define a ManyToManyfield in your user table. So, a user can be a part of one or many categories and vice versa. The categories table can store the different categories with their respective IDs.

answered Apr 02 '19 at 19:36

jaimish11

536
4
15

Yes, that's a cleaner way, but I'm not sure, how that could bring me closer for the solution. – Patrik Vörös Apr 02 '19 at 19:51

Paul Spiegel · Accepted Answer · 2019-08-21T17:58:19.287

A normalized version of you data could be:

create table items (
  id int,
  name varchar(50),
  primary key (id),
  index (name)
);

create table categories (
  id int,
  name varchar(50),
  primary key (id),
  index (name)
);

create table items_categories (
  item_id int,
  category_id int,
  primary key (item_id, category_id),
  index (category_id, item_id),
  foreign key (item_id) references items(id),
  foreign key (category_id) references categories(id)
);

insert into items (id, name) values
  (1, 'Lorem1'),
  (2, 'Lorem2'),
  (3, 'Lorem3'),
  (4, 'Lorem4');

insert into categories (id, name) values
  (1, 'A'),
  (2, 'B'),
  (3, 'C'),
  (4, 'D');

insert into items_categories (item_id, category_id) values
  (1, 1),
  (1, 2),
  (1, 3),
  (2, 1),
  (2, 2),
  (3, 1),
  (3, 3),
  (4, 2);

Now - When you search for items in categories 'A' and 'B', the SELECT query would be:

select i.*, count(*) as matches
from items i
join items_categories ic on ic.item_id = i.id
join categories c on c.id = ic.category_id
where c.name in ('A', 'B')
group by i.id
order by matches desc, i.name;

Result:

| id  | name   | matches |
| --- | ------ | ------- |
| 1   | Lorem1 | 2       |
| 2   | Lorem2 | 2       |
| 3   | Lorem3 | 1       |
| 4   | Lorem4 | 1       |

If you want to search in categories 'A' and 'C', change the WHERE clause to

where c.name in ('A', 'C')

The result would be:

| id  | name   | matches |
| --- | ------ | ------- |
| 1   | Lorem1 | 2       |
| 3   | Lorem3 | 2       |
| 2   | Lorem2 | 1       |

View on DB Fiddle

You can even "emulate" your original schema with

select i.*, group_concat(c.name separator ', ') as categories
from items i
join items_categories ic on ic.item_id = i.id
join categories c on c.id = ic.category_id
group by i.id

Result:

| id  | name   | categories |
| --- | ------ | ---------- |
| 1   | Lorem1 | A, B, C    |
| 2   | Lorem2 | A, B       |
| 3   | Lorem3 | A, C       |
| 4   | Lorem4 | B          |

It would be much harder to do it the other way round. That is (for me) a major reason to use a normalized schema.

A good read: Is storing a delimited list in a database column really that bad?

You are a wizard. Thank you very much! – Patrik Vörös Aug 22 '19 at 07:21 — Patrik Vörös, Aug 22 '19 at 07:21

Select from MySQL table, ordered by the matching number of categories

3 Answers3