6

I need to create a database of guys, guys can have one or more attributes, and each guy's attribute has a specific value, sounds easy eh? well, keep reading as the problem kinda gets impossible (5 days dealing with it :s).

So I create this 3 tables:

CREATE TABLE guy (
  id int(11),
  name varchar(255)
);

CREATE TABLE attribute (
  id int(11),
  name varchar(255)
);

-- each value references one guy and one attribute
CREATE TABLE _value (
  id int(11),
  guy_id int(11),
  attribute_id int(11),
  _value varchar(255)
);

with this example data:

INSERT INTO attribute VALUES (1, 'age'), (2, 'dollars'), (3, 'candies');
INSERT INTO guy VALUES (1, 'John'), (2, 'Bob');
INSERT INTO _value VALUES (1, 1, 1, 12), (2, 1, 2, 15), (3, 1, 3, 3);
INSERT INTO _value VALUES (4, 2, 1, 15), (5, 2, 2, 20), (6, 2, 3, 6);

and create this query:

SELECT g.name 'guy', a.name 'attribute', v._value 'value' 
FROM guy g 
JOIN _value v ON g.id = v.guy_id 
JOIN attribute a ON a.id = v.attribute_id;

which gives me this result:

+------+-----------+-------+
| guy  | attribute | value |
+------+-----------+-------+
| John | age       | 12    |
| John | dollars   | 15    |
| John | candies   | 3     |
| Bob  | age       | 15    |
| Bob  | dollars   | 20    |
| Bob  | candies   | 6     |
+------+-----------+-------+

THIS IS THE REAL PROBLEM:

Later on, my boss tells me he wants to filter data using as many conditions as he wants an be able to groups those conditions with "ands" and "ors", for example, he may want to do this crazy condition:

Get guys which age is greater than 10, have less than 18 dollars, have more than 2 candies and less than 10 candies, but no matter what, also include guys which age is exactly 15. This would translate to this filter:

-- should return both John and Bob
(age > 10 and dollars < 18 and candies > 2 and candies < 10) or (age = 15)

I have no problem creating the filter (I use jqgrid for that), the problem is that attributes aren't columns, but rows instead, and because of that I don't know how to mix the query with the filter, I tried with something like this:

SELECT g.name 'guy', a.name 'attribute', v._value 'value' 
FROM guy g 
JOIN _value v ON g.id = v.guy_id 
JOIN attribute a ON a.id = v.attribute_id
GROUP BY guy
HAVING (
    (attribute = 'age' and value > 10) AND
    (attribute = 'dollars' and value < 18) AND
    (attribute = 'candies' and value > 2) AND
    (attribute = 'candies' and value < 10)
       )
OR
       (
     (attribute = 'age' and value = 15)
       )

but only Bob is returned :( and I should get both John and Bob.

SO, HOW SHOULD I MIX THE FILTER AND THE QUERY?

Have in mind that the number of attributes each guy has is the same for all guys, but more attributes and more guys can be added anytime, for example, if I want to add the guy 'Mario' I would do:

-- we insert the guy Mario
INSERT INTO guy VALUES (3, 'Mario');
-- with age = 5, dollars = 100 and candies = 1
INSERT INTO _value VALUES (7, 3, 1, 5), (8, 3, 2, 100), (9, 3, 3, 1);

And if I want to create the attribute 'apples' I would do:

-- we insert the attribute apples
INSERT INTO attribute VALUES (4, 'apples');
-- we create a value for each guy's new attribute, John as 7 apples, Bob has 3 and Mario has 8
INSERT INTO _value VALUES (10, 1, 4, 7), (11, 2, 4, 2), (12, 3, 4, 8);

and now I should be able to include conditions about apples in my query.

I hope I made myself understandable, thank you for all your time :)

Note: Maybe If there were a way to put all of each guys attributes in one row?, something like this:

+------+-----------+-------+------+------------+--------+------+------------+--------+------+------------+--------+
| guy  | attribute | value | guy  | attribute  | value  | guy  | attribute  | value  | guy  | attribute  | value  |
+------+-----------+-------+------+------------+--------+------+------------+--------+------+------------+--------+
| John | age       |    12 | John | dollars    |     15 | John | candies    |      3 | John | apples     |      7 |
| Bob  | age       |    15 | Bob  | dollars    |     20 | Bob  | candies    |      6 | Bob  | apples     |      2 |
| Mario| age       |    5  | Mario| dollars    |     100| Mario| candies    |      1 | Mario| apples     |      8 |
+------+-----------+-------+------+------------+--------+------+------------+--------+------+------------+--------+

Note 2: @iim suggested (In this question: How to search in grouped columns in MySQL? (also in Hibernate if possible)) that I could do a self-join for each attribute, and yes that may solve the problem, but there may be performance problems when guys have tons of attributes (like 30 or more).

Note 3: I cannot change the database schema :(

Community
  • 1
  • 1
ilovelamp
  • 739
  • 3
  • 9
  • 20

5 Answers5

2

what about something like this?

SELECT g.name 'guy', a.name 'attribute', v._value 'value' 
FROM guy g 
JOIN _value v1 ON g.id = v1.guy_id 
  JOIN attribute a1 ON a1.id = v1.attribute_id
JOIN _value v2 ON g.id = v2.guy_id 
  JOIN attribute a2 ON a2.id = v2.attribute_id
JOIN _value v3 ON g.id = v3.guy_id 
  JOIN attribute a3 ON a3.id = v3.attribute_id
JOIN _value v4 ON g.id = v4.guy_id 
  JOIN attribute a4 ON a4.id = v4.attribute_id
JOIN _value v5 ON g.id = v5.guy_id 
  JOIN attribute a5 ON a5.id = v5.attribute_id
WHERE (
    (a1 = 'age' and v1 > 10) AND
    (a2 = 'dollars' and v2 < 18) AND
    (a3 = 'candies' and v3 > 2) AND
    (a4 = 'candies' and v4 < 10)
  ) OR (a5 = 'age' and v5 = 15)

edit fixing a few dumb errors:

SELECT DISTINCT g.id, g.name 'guy'
FROM guy g 
JOIN _value v1 ON g.id = v1.guy_id 
  JOIN attribute a1 ON a1.id = v1.attribute_id
JOIN _value v2 ON g.id = v2.guy_id 
  JOIN attribute a2 ON a2.id = v2.attribute_id
JOIN _value v3 ON g.id = v3.guy_id 
  JOIN attribute a3 ON a3.id = v3.attribute_id
JOIN _value v4 ON g.id = v4.guy_id 
  JOIN attribute a4 ON a4.id = v4.attribute_id
JOIN _value v5 ON g.id = v5.guy_id 
  JOIN attribute a5 ON a5.id = v5.attribute_id
WHERE (
    (a1.name = 'age' and v1._value > 10) AND
    (a2.name = 'dollars' and v2._value < 18) AND
    (a3.name = 'candies' and v3._value > 2) AND
    (a4.name = 'candies' and v4._value < 10)
  ) OR (a5.name = 'age' and v5._value = 15)

specifically, i forgot about the fieldnames in the WHERE clause, select only the 'guy' fields, and added DISTINCT to get only one row for each guy.

Javier
  • 60,510
  • 8
  • 78
  • 126
  • had to change your query to make it run, please let me know if I did the correct changes: changed: `code`SELECT g.name 'guy', a.name 'attribute', v._value 'value'`code` for: `code`*`code` changed each condition from: `code`a1 = 'age' and v1 > 10 ...`code` to: `code`a1.name = 'age' and v1._value > 10 ...`code` after running the query it gave me a result of 260 rows, should I be including a group by clause?, without the where clause it gave me a result of 3072 rows, Would there be no performance problems?, How should I interpret this results? Thank you for the fast reply :) – ilovelamp Oct 19 '11 at 01:46
  • @rhinojosa no need to do my answer, it's already here. there are no performance issues. WHERE statement is just like if statement, it only gets things that has fits those conditions. This is not going to have any performance problems, and the answer here was only joined depending on what the user wants to search by. that's it. – Nathan Oct 19 '11 at 02:07
  • @Nathan, Thx for the comment Nathan, I'm still a little confused by why so many rows where returned in the result, do you think I could modify Javier's query to make the result shorter? maybe a group by so that it shows the guys names only once – ilovelamp Oct 19 '11 at 02:50
  • @rhinojosa yes, definitely, or you can change every join into INNER JOIN to create just one row, and everything is already there. try this. Or you can create other variables that can hold the other queries, just like one variable for the value of apple, one for age, so that it would only show one row for each guy – Nathan Oct 19 '11 at 02:55
  • @rhinojosa: see edits, now it only shows the guys identity, so there should be only one row per guy. to get all the data for the guy, i'd do a second query, so you can optimize one for searching and the other for displaying. i guess it could be added without breaking the structure; but it would be really hard to debug. – Javier Oct 19 '11 at 03:00
  • also note that it would be better if you take out the `attribute` table, since it's there only to give the attributes a name; but your compiler could read the `id` and use `v1.attribute_id=3` instead of `a1.name='apple'` – Javier Oct 19 '11 at 03:06
  • @Javier, thanks Javier :), yes I did some tests changing the condition values and is working great :), I really like the performance part which is like priority one, thank you for the advice, It's getting really late and maybe I should get home (10:30pm and still in the office :s), I promise to do extensive testing tomorrow morning :), I think both your answer and muistooshort answer are very promising, do you think we could mix them and maybe improve the query?, again, thanks a lot for your time :) have an excellent day – ilovelamp Oct 19 '11 at 03:21
  • @Javie, Yes, you are totally right about excluding the attribute table, I'll work with jqgrid to get attributes by their ids instead of their names. – ilovelamp Oct 19 '11 at 03:25
1

Something like this might be an option:

select g.name as guy
from guy g
join _value v on g.id = v.guy_id
join attribute a on a.id = v.attribute_id
where (a.name = 'age'     and v._value > 10)
   or (a.name = 'dollars' and v._value < 18)
   or (a.name = 'candies' and v._value > 2)
group by g.name
having count(*) = 3

union

select g.name as guy
from guy g
join _value v on g.id = v.guy_id
join attribute a on a.id = v.attribute_id
 where (a.name = 'age' and v._value = 15)
group by g.name       -- These two clauses are not necessary,
having count(*) = 1   -- they're just her for symmetry

You turn your outer "or" conditions into UNIONs and your "and" conditions can be handled in the usual "having count(*) matches the number of conditions".

I don't know if this approach will work for everything that your boss wants you to do but maybe it will help.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • this looks promising :), do you think there would be any performance problems if guys start having tons of attributes (like 30 or more) and I end up having 100,000 guys or more? – ilovelamp Oct 19 '11 at 01:28
  • @rhinojosa: I'd guess that the limiting problem would be how many conditions there are more than the numbers of guys and attributes (assuming you have proper indexes in place of course). – mu is too short Oct 19 '11 at 01:37
  • I think both your answer and Javier answer are very promising, do you think we could mix them and maybe improve the query?, I may not be able to answer till tomorrow, It's getting late and I need to sleep, sorry about that :s, have an excellent day :) – ilovelamp Oct 19 '11 at 03:20
1

If the problem is "the problem is that attributes aren't columns, but rows instead", how about a view. You can't change the database schema but you might consider a view that is:

CREATE VIEW the_attributes as 
  select a.id, a.name as attribute_name, v._value
  from attribute a JOIN value v
  ON v.attribute_id = a.id

Starting with this might work better.

Then I think you should be able to do:

select guy.id from guy JOIN the_attributes ON the_attributes.guy_id = guy.id
where 
the_attributes.name = 'age' and _value > 10 and
the_attributes.name = 'dollar' and _value < 18 and
the_attributes.name = 'candies' and _value > 2 and
the_attributes.name = 'candies' and _value <10 ) or
the_attributes.name = 'age' and _value = 15 ) 

Whether all this ultimately helps you will have to judge but this is what sprang to mind for me reading the problem initially. Certainly looks readable ;(

Michael Durrant
  • 93,410
  • 97
  • 333
  • 497
  • thx for the reply :), I cannot run your SQL Statement, did you mean to write this?: `code`CREATE VIEW the_attributes as select a.id, v._value from attribute a JOIN _value v ON v.attribute_id = a.id`code`, Could you give me a hint of where to go from here? I still have values and attributes as rows, Do I need to self-join the view or something like that? – ilovelamp Oct 19 '11 at 02:08
1

The following will let you make your conditions more or less straightforward, though I can't promise it will be really efficient with 100,000+ guys with 30+ attributes. That you should see for yourself.

SELECT g.name guy, a.name attribute, v._value value
FROM guy g 
JOIN _value v ON g.id = v.guy_id 
JOIN attribute a ON a.id = v.attribute_id
GROUP BY guy
HAVING (
    SUM(a.name = 'age'     and v._value > 10) = 1 AND
    SUM(a.name = 'dollars' and v._value < 18) = 1 AND
    SUM(a.name = 'candies' and v._value > 2 ) = 1 AND
    SUM(a.name = 'candies' and v._value < 10) = 1
       )
OR
       (
    SUM(a.name = 'age'     and v._value = 15) = 1
       )

(I'm assuming here that a guy cannot have duplicate attributes.)

Andriy M
  • 76,112
  • 17
  • 94
  • 154
  • This works perfectly! Very neat and simple answer, I tested it with all kind of crazy queries an it seems to work in every case, I think I will go with your answer :) – ilovelamp Oct 20 '11 at 03:22
0

try this, maybe this will help.

SELECT g.name 'guy', a.name 'attribute', v._value 'value' 
FROM guy g 
JOIN _value v ON g.id = v.guy_id 
JOIN attribute a ON a.id = v.attribute_id
WHERE a.ID = v.attribute_ID
      AND v._value = 'values you want'
      AND  NOT v._value = 'values you don''t want'

let me know if you need anything else.

Nathan
  • 1,520
  • 1
  • 12
  • 21
  • Thx for the reply :), what difference does this make?: `code`WHERE a.ID = v.attribute_ID;`code`, Would you do me the favor of including this conditions in your query? `code`(age > 10 and dollars < 18 and candies > 2 and candies < 10) or (age = 15)`code`, sorry I don't do it myself, I'm not sure how :s – ilovelamp Oct 19 '11 at 01:57