Exclude similar rows on a SELECT using a many-to-many same-type relationship table

Question

I have two MySQL tables: Articles and SimilarArticles.

CREATE TABLE `Articles` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `text` text,
  `priority` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
)

CREATE TABLE `SimilarArticles` (
  `article_id` int(11) unsigned NOT NULL,
  `similar_article_id` unsigned NOT NULL,
  PRIMARY KEY (`article_id`,`similar_article_id`)
)

SimilarArticles is a simple relationship table to track similar article records.

User totymedli in this post has a good explanation how to SELECT similar records in this scenario which works fine.

But now I have to create a SELECT statement which makes sure that there are no similar articles in the result. So for all groups of similar articles in the table only the record with the highest value in field priority should get into the result set.

So far I haven't figured out an approach for this. Has anyone an idea how to do this efficiently in MySQL?

EXAMPLE:

Table Articles

id     text      priority

1      FooA      1
2      Bar       1
3      FooB      3
4      FooC      2
5      Baz       9

Table SimilarArticles

article_id  similar_article_id

1           3
4           1
3           4

The SELECT statement i'm trying to create should return:

id     text      priority

2      Bar       1
3      FooB      3
5      Baz       9

Because FooA, FooB und FooC are similar according to the SimilarArticles table and FooB has the highest priority of these three records, only FooB should be in the result set (together with Bar and Baz, which have no similar records).

You kind of contradict yourself. You say that you need there to be **no similar articles** then go on and list a use case of what to do when there are similar articles? — Lock, Jan 04 '16 at 03:12
Can you provide sample data and desired results to clarify what you are trying to do? — Gordon Linoff, Jan 04 '16 at 03:15
@Lock: Sorry, probably I was not clear enough. The case is about having similar article rows (=connected by the SimilarArticles table) in the Articles table, not in the result set. In the result set I want to exclude similar articles. — Henrik, Jan 04 '16 at 03:28
@Gordon Linoff: Thank you very much for your answer! I added some sample and hope to make it clearer that way. — Henrik, Jan 04 '16 at 03:49

score 0 · Answer 1 · edited Jan 05 '16 at 19:02

0

You need to do a left join and filter out any similar articles. See this great explanation of excluding using a LEFT JOIN by SO founder Jeff Atwood.

   SELECT Articles.*
     FROM Articles
LEFT JOIN SimilarArticles ON Articles.id = SimilarArticles.article_id
    WHERE SimilarArticles.article_id IS NULL
 ORDER BY priority DESC

If you also want to get the highest priority article that is not related to similar article, then you'll have to union this query with that one:

    SELECT *
      FROM Articles
 LEFT JOIN SimilarArticles ON Articles.id = SimilarArticles.article_id
     WHERE SimilarArticles.article_id IS NULL

 UNION ALL

    SELECT Articles.*
      FROM Articles
INNER JOIN SimilarArticles ON Articles.id = SimilarArticles.article_id
INNER JOIN (SELECT max(priority) AS priority FROM SimilarArticles) maxp 
           ON maxp.priority = Articles.priority
     WHERE SimilarArticles.article_id IS NOT NULL

Note: The second query after the union will return all records that are the highest priority, not just the first. If you really only want to return the first, then you'll have to add LIMIT 0,1 to the end of that query.

edited Jan 05 '16 at 19:02

Henrik

3
1

answered Jan 04 '16 at 03:43

Harper Maddox

540
4
8

Thanks for the very good link. But I think this filters out all similar articles with no exceptions. What I try to accomplish is to keep the article with the highest priority in a group of similar articles in the result set. – Henrik Jan 04 '16 at 06:12
Thanks for you answer Harper. I also experimented with a `UNION` and `INNER JOIN` but I can't think of a working solution here. If I understand correctly `SELECT max(priority) FROM Articles` gets the maximum priority of the whole table here, not of a group of similar articles. I now found a partly solution skipping similar articles when iterating over the result set (see my answer). – Henrik Jan 04 '16 at 21:06
Did running my full query work for you? Just copy and paste the whole block with the UNION ALL in it. It should return the two rows you want (id 2 and 3). It doesn't just return a priority (thats just the nested part of the query). – Harper Maddox Jan 04 '16 at 23:05
Hey, I ran the query with small modification because else I got errors: I changed `SimilarArticles.id` to `SimilarArticles.article_id` in the WHERE parts and changed `SELECT max(priority)` to `SELECT max(priority) AS priority`. It indeed worked for the sample data. But if you just add a single row in the Article table with a higher priority (like I just did in the sample data) it won't work anymore. I guess the subquery is returning the max value of the whole table and not only relates to the group of similar questions like it should? – Henrik Jan 05 '16 at 02:38

score 0 · Answer 2 · answered Jan 04 '16 at 21:05

A working solution I found is to create a query that gives me all similar article ids of each article comma-separated in an extra column.

While iterating over the result in my server side language I explode the similar article ids and build up an array containing all similar ids. This gives me the possibility to skip all rows whose IDs have already stored in that array.

While I would prefer a solution in pure SQL I think this is working quite well in my situation.

The Query:

SELECT
  id,
  GROUP_CONCAT(
    IF (sa.article_id = a.id, similar_article_id, article_id)
    SEPARATOR ','
  ) AS similar_ids,
  text,
  priority
FROM Articles a

LEFT JOIN SimilarArticles sa
  ON a.id = sa.article_id OR a.id = sa.similar_article_id

GROUP BY id
ORDER BY priority DESC

The result (using the given sample data):

id  similar_ids    text   priority
1   3,4            FooA   3
4   3,1            FooC   2         <- Will be skipped
2   NULL           Bar    1         
3   1,4            FooB   1         <- Will be skipped

Exclude similar rows on a SELECT using a many-to-many same-type relationship table

EXAMPLE:

2 Answers2

The Query:

The result (using the given sample data):