0

I have the following 2 tables

table_article:

id  subject     tags
---------------------
1   subject-1   2,4,5
2   subject-2   3,5
3   subject-3   1,2
4   subject-4   2,3,4
5   subject-5   3

table_tags:

id  tag_name
---------------------
1   php
2   jQuery
3   css
4   mysql
5   java

and I'm trying to get results like

id => 1, subject => subject-1, tag_names => jQuery,mysql,java
id => 2, subject => subject-2, tag_names => css,java
id => 3, subject => subject-3, tag_names => php,jQuery

Here is my current attempt, which returns ONLY the first tag (e.g. 2 instead of 2,4,5 for row 1)

 1    SELECT
 2        table_article.id,
 3        table_article.subject,
 4        GROUP_CONCAT(table_tags.tag_name) AS tag_names
 5    FROM
 6        table_article
 7    LEFT JOIN
 8        table_tags
 9    ON
10        (table_tags.tag_id IN (table_article.tags))
11    GROUP BY
12        table_article.id
13    LIMIT
14        3

and the results are

id => 1, subject => subject-1, tag_names => jquery
id => 2, subject => subject-2, tag_names => css
id => 3, subject => subject-3, tag_names => php

The problem occurs on line 10 -> IN (table_article.tags).

I just can't figure out how could I solve this problem, can anyone help please?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user1643156
  • 4,407
  • 10
  • 36
  • 59
  • When you say "not relational", do you mean "not normalized"? Because it looks like your tables *are* relational, in that they have a primary key. No? – ruakh Nov 02 '12 at 20:28

4 Answers4

3

You can't use a string that happens to contain commas as a list of discrete values.

In other words this:

ON table_tags.tag_id IN (2,4,5)

Is not the same as this:

ON table_tags.tag_id IN ('2,4,5')

The numeric value of a string like '2,4,5' is the initial numeric portion, and the remainder after the first non-numeric character is ignored. So the string '2,4,5' has a numeric value of 2. It won't be an error, but it won't get you what you intended, which is a match against any of the values in the comma-separated list.

MySQL has a built-in function FIND_IN_SET() which does understand strings that contain comma-separated values. The function returns the position of the matching value, or 0 if no match was found.

ON FIND_IN_SET(table_tags.tag_id, '2,4,5') > 0

But this cannot use an index and it forces you to run a table-scan which is going to kill your performance. To be clear, I don't recommend using this function in a join condition.

The answer is: Don't store tags in a comma-separated list. See my answer for Is storing a comma separated list in a database column really that bad?

Store one tag per row in a separate table, as @Martin Lyne suggests. That way you can look for the right tag with = and you can even index the column for much better performance.

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Looks like you had the same suggestion. – wesside Nov 02 '12 at 20:39
  • I don't recommend using FIND_IN_SET() in a join condition. – Bill Karwin Nov 02 '12 at 20:41
  • It seems that I'm going the wrong way. Yes, I DID have another table for linking table_article and table_tags, with each entry on its own row. I'd better keep it there now. Thanks for the info Bill. As for my specific question, I'll take bigman's answer as accepted. – user1643156 Nov 02 '12 at 20:45
  • @BillKarwin You are right about the `> 0`, updated my answer. Thanks for the heads up. – wesside Nov 02 '12 at 20:50
1

I've not seen an IN in a ON before (not saying it's not valid) but I would do ON table_tags.tag_id = table_article.tags)

So you end up with multiple rows

subject-1, query
subject-1, css
subject 2, query

then the GROUP BY would compress the table and the GROUP_CONCAT gets all the missing tags.

Martin Lyne
  • 3,157
  • 2
  • 22
  • 28
0

Well, I would use IN in this situation, it won't work, replace it with FIND_IN_SET(table_tags.tag_id, table_article.tags) > 0 and you'll be fine. Though you really should normalize this.

wesside
  • 5,622
  • 5
  • 30
  • 35
  • Just FYI, Bill was right that you need to `>0` at the end, if it's 0, that mean's its not in the set. +1 to @BillKarwin – wesside Nov 02 '12 at 20:49
0

As other said, this is not a good design.

Instead, you could change your table design this way:

table_article
   id
   subject

article_tag
   article_id
   tag_id

table_tags
   id
   tag_name

Life would be much easier this way :-)

Roozbeh Zabihollahi
  • 7,207
  • 45
  • 39
  • As I said to Bill, I had exactly the same design as you suggested. I was just being stupid whishing to remove a table :) – user1643156 Nov 02 '12 at 20:50