13

I have two tables, categories and movies.

In movies table I have a column categories. That column consists of the categories that movie fits in. The categories are IDs separated by a comma.

Here's an example:

Table categories {
  -id-       -name-
  1          Action
  2          Comedy
  4          Drama
  5          Dance
}

Table movies {
  -id-       -categories-  (and some more columns ofc)
  1          2,4
  2          1,4
  4          3,5
}

Now to the actual question: Is it possible to perform a query that excludes the categories column from the movies table, and instead selects the matching categories from the categories table and returns them in an array? Like a join, but the problem is there are multiple categories separated by comma, is it possible to do some kind of regex?

Johan
  • 74,508
  • 24
  • 191
  • 319
Katie
  • 173
  • 1
  • 1
  • 6
  • 5
    Why aren't you normalizing to three tables; Movies, Categories, MoviesCategories? – Brad Christie Jul 28 '11 at 15:11
  • 1
    What do you mean by saying return them in an array? – Karolis Jul 28 '11 at 15:12
  • 1
    Can you improve on the question title? A list of technologies does not describe the problem. That's what tags are. – Lightness Races in Orbit Jul 28 '11 at 15:13
  • @Brad, that is a really good idea. I didn't think of that when creating the database. It would have made it a lot easier for me. How about the speed? Would it be a lot more processor heavy if i seperate them into a third table? – Katie Jul 28 '11 at 15:34
  • @Johan: I was hoping the OP could gain the experience of doing it herself! – Lightness Races in Orbit Jul 28 '11 at 15:36
  • 1
    @Katie: It will be vastly _more_ efficient (in general), because that is how relational databases are _designed to work_. – Lightness Races in Orbit Jul 28 '11 at 15:37
  • @Tomale: Great! I've separated the tables so now i have "movies", "categories" and "movies_categories". I'm having some trouble writing the query though. It almost works as i want it now, the only problem is that i want to get all categories, not just the first one. Is it possible to concatenate the categories by a comma? Here's my query: `SELECT movies.*, categories.name FROM movies LEFT JOIN movies_categories ON (movies.id = movies_categories.movie_id) LEFT JOIN categories ON (movies_categories.category_id = categories.id)` – Katie Jul 28 '11 at 16:38
  • And also, for some reason i'm not able to add a WHERE clause which i definately need. Any ideas why? – Katie Jul 28 '11 at 16:40

5 Answers5

21
select
    m.id,
    group_concat(c.name)
from
    movies m
    join categories c on find_in_set(c.id, m.categories)
group by
    m.id

The output should be something like this:

Table movies {
  -id-       -categories-
  1          Comedy,Drama
  2          Action,Drama
  4          Other,Dance
}
Karolis
  • 9,396
  • 29
  • 38
  • There is at least one syntax error here. I think if you add the comma after `movies m` and change the `ON` to a `WHERE` (you're not using written-out `JOIN` syntax) then you'll get that output. But I can't see where this `"Other"` has come from? – Lightness Races in Orbit Jul 28 '11 at 15:32
  • @Tomalak It was a typo. I forgot to add `join`. I don't like to write join conditions in WHERE clause. **Other** is just an example, it depends on `categories` table. – Karolis Jul 28 '11 at 15:40
13

Using comma separated lists in a database field is an anti-pattern and should be avoided at all costs.
Because it is a PITA to extract those comma separated values out agian in SQL.

Instead you should add a separate link table to represent the relationship between categories and movies, like so:

Table categories
  id integer auto_increment primary key
  name varchar(255)

Table movies
  id integer auto_increment primary key
  name varchar(255)

Table movie_cat
  movie_id integer foreign key references movies.id
  cat_id integer foreign key references categories.id
  primary key (movie_id, cat_id)

Now you can do

SELECT m.name as movie_title, GROUP_CONCAT(c.name) AS categories FROM movies m
INNER JOIN movie_cat mc ON (mc.movie_id = m.id)
INNER JOIN categories c ON (c.id = mc.cat_id)
GROUP BY m.id

Back to your question
Alternativly using your data you can do

SELECT m.name as movie_title
  , CONCAT(c1.name, if(c2.name IS NULL,'',', '), ifnull(c2.name,'')) as categories 
FROM movies m
LEFT JOIN categories c2 ON 
 (replace(substring(substring_index(m.categories, ',', 2),
  length(substring_index(m.categories, ',', 2 - 1)) + 1), ',', '') = c2.id)
INNER JOIN categories c1 ON 
 (replace(substring(substring_index(m.categories, ',', 1), 
  length(substring_index(m.categories, ',', 1 - 1)) + 1), ',', '') = c1.id)

Note that the last query only works if there are 2 or fewer categories per movie.

Johan
  • 74,508
  • 24
  • 191
  • 319
4

Brad is right; normalisation is the solution. Normalisation exists to solve this problem. It should be covered pretty well in your MySQL book if it's worth its salt.


If you really insist, though, you can fake the direct join by cross-matching with FIND_IN_SET (which conveniently expects a comma-delimited string of items).

Now, MySQL can't return "an array" — that's what sets of results are for — but it can give you the category names separated by, say, a pipe (|):

SELECT
       `m`.`id`,
       `m`.`name`,
       GROUP_CONCAT(`c`.`name` SEPARATOR "|") AS `cats`
  FROM
       `movies`     AS `m`,
       `categories` AS `c`
 WHERE
       FIND_IN_SET(`c`.`id`, `m`.`categories`) != 0
 GROUP BY
       `m`.`id`;

Result:

id  "name"     "cats"
---------------------------------------------------
1   "Movie 1"  "Comedy|Drama"
2   "Movie 2"  "Action|Drama"
4   "Movie 4"  "Dance"
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
0

Try This

SELECT m.*, c.* FROM movies m 
RIGHT JOIN categories c on find_in_set(c.id, m.categories) 
GROUP BY m.id
Merrin K
  • 1,602
  • 1
  • 16
  • 27
-1

This isn't directly answering your question but what you have in the movies table is really bad.

Instead of combining categories using comma, what you should be doing is to have each category on separate rows, eg:

Table movies {
  -id-       -categories-
  1          2
  1          4
  2          1
  2          4
  4          3
  4          5
}
ryanprayogo
  • 11,587
  • 11
  • 51
  • 66