3

I have the following SQL query that selects any row matching ANY of the values in the list (9, 10):

SELECT
     r.id, r.title, 
     u.name as 'Created By',
     c.name as 'Category',
     c.value,
     cr.category_id
FROM
     category_resource cr
          INNER JOIN resource r
          ON cr.resource_id = r.id
               INNER JOIN user u
               ON r.created_by = u.id
               INNER JOIN category c
               ON cr.category_id = c.id
WHERE cr.category_id IN ('10', '9');

I have tried finding out how to do the opposite, which I also need, i.e select rows that match ALL values.

I have read about using a statement sort of like this:

SELECT
    r.id, r.title
FROM
    resource r
WHERE
    id IN (
        SELECT
            resource_id
        FROM
            category_resource
        WHERE
            category_id IN (9, 10)
        GROUP BY
            resource_id
        HAVING
            COUNT(DISTINCT category_id) = 2
    );

This is my attempt at adapting this answer to my needs: SQL Server - select rows that match all items in a list

But that doesn't get me as much information back in the results as the first statement does. So how could I do something that is more equivalent? I've tried to put it together, but I'm too new to SQL to figure it out right, I just get errors...

LONG UPDATE:

Gordon Linoff pointed out that this is a strange request. And I know, I feel that it’s strange too, to have a query that returns multiple rows for the same resource. But I don’t know how to do this any other way, maybe I’m on the wrong track altogether, because it just hit me that the original request (the one getting all rows for resources with categories matching ANY in the list) does quite not fulfill my requirement either…

Here’s my requirement as a whole:

First of all, I think the model of this part of the db might be helpful.

enter image description here

(BTW category has a relation with itself also because it is stored as a hierarchy, using the adjacency model, where each category stores its parent id, if anyone was wondering about that arrow...)

1: Find all resources where a category matches ANY of the values in the list. But (this is where that was insufficient) for each of these resources, I need to know the resource along with all of the categories it has…

Let me explain the point of this with a simple example:

This is a many to many relationship as you can see. A resource (e.g let’s say with the title ”Introduction to carpentry in 18th century New England”) can be associated with many categories (e.g category.name = ”subject” value = ”Carpentry”, category.name=”subject” value = ”Wood”, category.name=”subject” value = ”New England”, category.name=”subject” value = ”History”). Note that this example is simplified, but you see the basic idea.

Now, if a user searches for a resource matching ANY of the categories ”Carpentry” and ”Painting”, the resource ”Introduction to carpentry in 18th century New England” should show up in the results, since one of its categories matched. But, here’s the rub, and why Gordon felt my request was strange: in the search result I want to present to the user, I want to list the title ”Introduction to carpentry in 18th century New England” along with a column showing all the categories that title is classified with, even though the user didn’t search for them - in order to give a better overview of the complete subject matter of this resource.

So how can I do that? The only way I could think of was the first statement in my question, but as I said it just hit me that it doesnt’ give me all categories a resource might have, only the ones actually searched for…

Of course I could do a query for the results first, only getting one row for each. And then do a second query, finding all categories for each resource in the results. But if the first query gives me 1000 results (which will be common), then to get the categories for all of these I would have to do 1000 queries to get the categories for each… Sounds like that would give me performance issues…

Am I thinking about this the wrong way? Is there another way to accomplish what I want to do? I.e give me the resources the query selects, along with all of that resource’s associated categories...

2: Well, after that long explanation, the second requirement is simpler to explain: again the same thing about getting all categories back for a resource selected, but this time the selection in the query should only get those resources that match ALL of the provided values. However, just because I provide all of the values in the query again doesn’ mean that I already have all the categories, since each resource in the results may actually have more (and other) categories and I need those too when presenting the results as mentioned in the first (ANY) requirement.

Community
  • 1
  • 1
Anders
  • 12,556
  • 24
  • 104
  • 151
  • It is normal to get back less results, since you are getting the overlap only – Gabriele Petrioli Jan 28 '14 at 00:52
  • Yes, but I didn't mean that I didn't expect to get less results, I just meant that I needed to get more columns back in the results, like the first query did. However, I discovered that the question was a bit more complicated, please see my edit just now. – Anders Jan 28 '14 at 08:12

2 Answers2

3

Update 2 speed issues

A speed improvement (to avoid executing the subquery for each row) is to create a temporary table with the resource id that match the subquery and use that in the main query by joining against it.

/*Create a temporary table with the ids we want (the subquery)*/
CREATE TEMPORARY TABLE Matching_Resources (INDEX(resource_id))
AS (
  SELECT
    resource_id
  FROM
    category_resource
  WHERE
    category_id IN (4,1)
  GROUP BY
    resource_id
  HAVING
    COUNT(DISTINCT category_id) = 2
);

SELECT
  r.id, r.title,
  u.name AS 'Created By',
  GROUP_CONCAT( CONCAT('[',c.name,',',c.value,',',CAST(c.id as CHAR),']') separator ' // ') AS 'Categories'
FROM
  resource r
  INNER JOIN Matching_Resources mr
    ON r.id = mr.resource_id
  INNER JOIN category_resource cr
    ON r.id = cr.resource_id
  INNER JOIN category c
    ON cr.category_id = c.id
  INNER JOIN user u
    ON r.created_by = u.id
GROUP BY r.id

Update 1 some comments

In both cases you want the category filtering to act just as a filter for the matching resource ids only. So you need to make it a subquery in order to avoid affecting the main query which needs to only restrict resources but return all matching categories.

So WHERE r.id IN (..) part must exist in both solutions. You already know how to do the filtering in there (as i am only using the same code you provided)


For the requirement of matching ANY provided category

SELECT 
     r.id, r.title, 
     u.name as 'Created By',
     c.name as 'Category',
     c.value,
     cr.category_id
FROM 
  resource r
  INNER JOIN category_resource cr
    ON r.id = cr.resource_id
  INNER JOIN category c
    ON cr.category_id = c.id
  INNER JOIN user u
    ON r.created_by = u.id
WHERE 
  r.id IN 
    (
      SELECT
        resource_id
      FROM
        category_resource
      WHERE
        category_id IN (6,1)
    )

Demo at http://sqlfiddle.com/#!3/d9486/8/0


For the requirement of matching ALL provided categories

SELECT 
     r.id, r.title, 
     u.name as 'Created By',
     c.name as 'Category',
     c.value,
     cr.category_id
FROM 
  resource r
  INNER JOIN category_resource cr
    ON r.id = cr.resource_id
  INNER JOIN category c
    ON cr.category_id = c.id
  INNER JOIN user u
    ON r.created_by = u.id
WHERE 
  r.id IN 
    (
      SELECT
        resource_id
      FROM
        category_resource
      WHERE
        category_id IN (1,4)
      GROUP BY
        resource_id
      HAVING
        COUNT(DISTINCT category_id) = 2
    )

Demo at http://sqlfiddle.com/#!3/d9486/10/0

Gabriele Petrioli
  • 191,379
  • 34
  • 261
  • 317
  • Beautiful! Works like a charm (except I had to remove the square brackets, maybe a different SQL syntax, I'm using MySQL). If you could only provide some little explanation of what it does, I would be very grateful... Also, please tell me, is this the right way to do what I want to accomplish? I know I get multiple results, and in my Java code I will then have to "weed out" the duplicate resources, just concatenating the Category values for each in a loop. This is fine though, and it's fast. But just want to make sure I'm not missing a better solution! – Anders Jan 28 '14 at 11:44
  • BTW, I'll mark this as answered now, great answer! Thanks! But if you could complete it with some comments on my questions above that would be great. – Anders Jan 28 '14 at 11:45
  • @AndersSvensson you could use [GROUP_CONCAT](http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat) to combine multiple rows to one ***demo at** http://sqlfiddle.com/#!8/4bab6/9/0* You could format it anyway you like to ease the parsing you will do from java.. – Gabriele Petrioli Jan 28 '14 at 12:08
  • @AndersSvensson also added a small paragraph explaining the concept. (*it was really just a combination of your two solutions*). Your second one returned few result because you had not joined it with the categories.. – Gabriele Petrioli Jan 28 '14 at 12:16
  • Ok thanks. This is helpful for understanding how it works. Discovered a problem though, even though your answer is correct and it works: the performance goes down to unusable unfortunately. Whereas the original statement took 10 ms for 5000 rows of resources, this statement (ALL) took 66 seconds... No way of getting around this I guess? – Anders Jan 28 '14 at 12:43
  • I'm sorry, I'm getting an error when trying the statement you suggested in your last update... MySql error messages don't seem very informative unfortunately, that's all it says. I thought it might have something to do with the square brackets again, but it seems not, because removing that part didn't matter. Any ideas what might be wrong? Did you test this yourself on a "fiddle" or something? – Anders Jan 28 '14 at 14:49
  • @AndersSvensson sorry about that.. a `;` was missing after the whole temporary table part, to separate the two commands.. ***updated answer*** – Gabriele Petrioli Jan 28 '14 at 15:00
  • No problem, thanks so much for following this up... However, the query now worked after that last change, but I'm not getting any results at all, even though I know there are matching rows for the Category id:s provided... – Anders Jan 28 '14 at 15:16
  • @AndersSvensson can you put a backup of your DB somewhere for download ? I am asking because i have made tests with the above code and it works for me.. – Gabriele Petrioli Jan 28 '14 at 16:16
  • @AndersSvensson do not forget to adjust the `COUNT(..) = ?` check when changing the ids.. – Gabriele Petrioli Jan 28 '14 at 16:29
  • @AndersSvensson and finally another bad copy/paste from me .. i had forgotten the `GROUP BY r.id` in the query.. – Gabriele Petrioli Jan 28 '14 at 16:41
  • I have no problem sharing the sample db with you. I just don't want to share a dropbox space or anything here on SO, but I emailed you an SQL dump on the email registered here. I am now getting a result after your last update, but still not the expected result. For the query you posted I should only get hits if a resource matches all in the list, right? But I get a hit even if there's a category in there that shouldn't match. Please try it out for yourself on the sample db! Thanks! – Anders Jan 28 '14 at 17:00
  • @AndersSvensson ok, remove the `Matching_Resources` from the schema. It was supposed to be a temporary table.. and add the `TEMPORARY` keyword at the table (*have updated code*). After that it works with the tests I did on your DB.. – Gabriele Petrioli Jan 28 '14 at 19:05
  • Ok, thanks, but that's strange... and you are getting matches? It works for me in the sense that I'm not getting errors, but I am not getting any matches... Did you test with MySql? – Anders Jan 28 '14 at 19:47
  • It's weird, but now it's working. Thanks! For some reason I had to close the console I was working in and reload it, and then it worked. – Anders Jan 28 '14 at 21:23
1

You can join the results back in:

SELECT u.name as "Created By", c.name as 'Category', c.value, cr.category_id
FROM resource r join
     user u
     on r.created_by = u.id join
     (SELECT resource_id
      FROM category_resource
      WHERE category_id IN (9, 10)
      GROUP BY resource_id
      HAVING COUNT(DISTINCT category_id) = 2
     ) crr
     on r.id = crr.resource_id join
     category_resource cr
     on cr.resource_id = r.id join
     category c
     on cr.category_id = c.id;

This seems like a strange request, because you are going to get (at least) two rows per resource, one for each category.

Also, don't use single quotes for column aliases. These should be used only for string (and date) constants.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you, that was helpful. I just dicovered though that I had not explained the problem thoroughly enough, please see my edit if you can help... – Anders Jan 28 '14 at 08:14