0

I have a few tables with some (example) values:

people
----------------------------------
| id    | name                   |
----------------------------------
| 1     | Steve Jobs             |
| 2     | Bill Gates             |
| 3     | Linus Torvalds         |
| 4     | Nikola Tesla           |
| 5     | Henry Ford             |
----------------------------------

.

groups
----------------------------------
| id    | name                   |
----------------------------------
| 1     | Country                |
| 2     | Work                   |
----------------------------------

.

categories
------------------------------------------
| id    | gid   | name                   |
------------------------------------------
| 1     | 1     | USA                    |
| 2     | 1     | Finland                |
| 3     | 1     | Croatia                |
| 4     | 2     | Mac                    |
| 5     | 2     | iPhone                 |
| 6     | 2     | iPad                   |
| 7     | 2     | Windows                |
| 8     | 1     | Office                 |
| 9     | 1     | Linux                  |
| 10    | 1     | Coil                   |
| 11    | 1     | El. stuff              |
| 12    | 1     | Cars                   |
------------------------------------------

.

people_categories_map
--------------------------
| id    | pid   | cid    |
--------------------------
| 1     | 1     | 1      |
| 2     | 1     | 4      |
| 3     | 1     | 5      |
| 4     | 1     | 6      |
| 5     | 1     | 11     |
| 6     | 2     | 1      |
| 7     | 2     | 7      |
| 8     | 2     | 8      |
| 9     | 2     | 11     |
| 10    | 3     | 2      |
| 11    | 3     | 9      |
| 12    | 3     | 11     |
| 13    | 4     | 1      |
| 14    | 4     | 3      |
| 15    | 4     | 10     |
| 16    | 4     | 11     |
| 17    | 5     | 1      |
| 18    | 5     | 12     |
--------------------------

The challenge

What I'm trying to achieve, is to list people based on matches from each group. For example, I want to list people from (country) USA (categories.id:1) OR Finland (categories.id:2) AND Work Linux (categories.id:9) OR Cars (categories.id:12). This query would match only Linus Torvalds (people.id:3) based on relations from the people_categories_map table.

However, so far I have only managed to make queries that either list all category matches or any category matches, i.e.:

SELECT people.id, people.name FROM people JOIN people_categories_map ON people_categories_map.pid = people.id WHERE people_categories_map.cid IN (1,2);

SELECT people.id, people.name FROM people JOIN people_categories_map ON people_categories_map.pid = people.id WHERE people_categories_map.cid IN (1,9) HAVING COUNT(DISTINCT(people_categories_map.cid)) = 2;

Is it possible to combine these somehow, to make a query that returns a list of people that would be connected to "(category_x OR category_y) AND (category_z OR category_w OR category_m) AND (category_b OR category_c)" and so on... in other words, to return matches from all groups?

tshepang
  • 12,111
  • 21
  • 91
  • 136
  • I'm not sure I understand what you want to do... Can you explain it with other words, or give another exemple ? – Alexandre FILLATRE May 11 '14 at 21:31
  • Thanks for your reply @AlexandreFILLATRE. Here's a practical example: I want a list of people that work with either "cars" or "el. stuff", and they have to be from either "Finland" or "Croatia". This would return only Linus T. since this is the only record in the people table that has matches in both groups (Country and Work, match on Work:El.stuff and Country:Finland)... hope the example clears things up a little... – user3626415 May 11 '14 at 21:39
  • This is still hard to understand, but I think it comes from the data-model. Why don't you have a Country and a Work table instead ? That should make things way easier, and that would have more sense IMO – Alexandre FILLATRE May 11 '14 at 21:44
  • Sure, that would be much easier. But in the web app front end the database is suppose to be used, one will be able to add both groups and categories (each category is a sub of a group). So groups (or it may be easier to call it main categories and sub categories) may as well be "company", "age range", "gender", "city", "hobby", "iq" or whatever... in other words, it would be hard to have that flexibility if the groups/main categories had its own tables... but off course, much easier to query... – user3626415 May 11 '14 at 22:09
  • OK. So following your example, I don't see Linus T. maching both groups. He's linked to categories 2, 9, and 11, which are only part of group 1. So is your example or data wrong, or am I wrong myself ? – Alexandre FILLATRE May 11 '14 at 22:21
  • As always, consider providing proper DDLs (and/or an sqlfiddle) TOGETHER WITH THE DESIRED RESULT SET – Strawberry May 11 '14 at 22:56

1 Answers1

1

As Alexandre in his comment said, your design is more than problematic.

Your example is faulty too: T

  1. The categories 8 - 12 should have the gid 2 not 1.
  2. The query should return "Henry Ford" too, because he lives in the USA and works on Cars.

Following not beautiful query with a lot of joins do it (I'm sure it could be optimized, but better to optimize the database design):

SELECT *
FROM people_categories_map pmc1
INNER JOIN people_categories_map pmc2
ON  pmc1.pid = pmc2.pid AND pmc1.cid <> pmc2.cid
INNER JOIN people p
ON pmc1.pid = p.id
INNER JOIN categories c1
ON pmc1.cid = c1.id
INNER JOIN categories c2
ON pmc2.cid = c2.id
INNER JOIN groups g1
ON c1.gid = g1.id
INNER JOIN groups g2
ON c2.gid = g2.id
WHERE
    c1.name IN ('USA', 'FINLAND')
AND
    c2.name IN ('Linux', 'Cars');

Explanation

First we do a self join of the mapping table, so we have access to both country and work of a person. It's not a real INNER JOIN, because our join condition contains an unequal-condition.

Then we join our people once, but categories and groups twice: first for country, secondly for work and filter so. You could swap those two, without losing or changing data. It's only a matter of definition.

Remark: This is not efficient, but I don't believe it is worth to optimize.


See a fiddle for the example

If you want a person only once, use

SELECT DISTINCT p.id, p.name
FROM ...
VMai
  • 10,156
  • 9
  • 25
  • 34
  • I voted up for your answer because you've gone farther than I would, but I'm not sure this is the best way to do it, mainly because you have to indicate C1 and C2 names. Anyway that seems pretty close to the best we can do with this design. – Alexandre FILLATRE May 11 '14 at 22:31
  • Thanks for your replies. As you both pointed out, the categories table has errors in the gid col, as id 8-12 should be gid:2...sorry about that. I'll try your example @VMai . However, the design is not unchangeable (the project is in early design stage), so any design change suggestions are greatly appreciated, thanks :) What's important is that main categories (like Country, Work etc.) and their sub categories (like USA, Finland etc.) are not static, in the sense that main categories (groups) and sub categories (categories) can be added, altered and deleted without altering the db structure.. – user3626415 May 11 '14 at 22:32
  • @user3626415 You can play with the where clause of the [fiddle](http://sqlfiddle.com/#!2/f1c42/1). – VMai May 11 '14 at 22:37
  • Thanks VMai for taking time to make a fiddle, I'll fiddle more with it tomorrow (00.40 local time ;) )... – user3626415 May 11 '14 at 22:37
  • I disagree that it's inefficient – Strawberry May 11 '14 at 22:48
  • @Strawberry It seems wrong to me. Think of the result of the self join with the unequality. But such categories and subcategories shoudn't tend to have a great lot of rows ... – VMai May 11 '14 at 22:55
  • @user3626415: Have a look at [Bill Karwins](http://stackoverflow.com/users/20860/bill-karwin) [slides to SQL Antipatterns](http://de.slideshare.net/billkarwin/sql-antipatterns-strike-back), especially slide 17 ff.). Even if you have only two levels, you could use [closure tables](http://karwin.blogspot.de/2010/03/rendering-trees-with-closure-tables.html) or nested sets. – VMai May 11 '14 at 22:57
  • Thanks a lot for good tips on antipatterns. Btw, I altered and stripped down the query a little since I don't need to include the categories and groups tables anymore. I'll performance test the query a little before I decide to use it, but it actually looks promising. – user3626415 May 12 '14 at 13:11
  • Btw, is there any advantages by using <> instead of !=, or are they equal? – user3626415 May 12 '14 at 13:14
  • @user3626415 `pmc1.cid <> pmc2.cid` does an minimal filtering because it eliminates those rows with equal categories. The where clause would filter those anyway. – VMai May 12 '14 at 13:15
  • @user3626415 We're not synchronized. <> and != are just the same in MySQL, see http://stackoverflow.com/questions/23227723/what-does-mean-on-sql/23227776#23227776 :) – VMai May 12 '14 at 13:18