In MySQL, how can I query efficiently on two or more attributes at the same time without adding them as columns?

Question

Let's say I have a table of people and a table of attributes related to the people. (No, I'm not building a dating site, it's just a good example.) The table of people looks like this:

id integer
name varchar(100)

The table of attributes looks like this:

person_id integer
name varchar(100)
value varchar(100)

I can fetch all the attributes of a person very easily:

SELECT * FROM attribute WHERE person_id = 5;

If I want to find all the people who are more than 60 inches tall, that is easy too:

SELECT person_id FROM attribute WHERE name = 'height' AND value > 60;

And of course I can add a join to these queries to get the person's name.

But if I need to find the people more than 60 inches tall who are ALSO over 40 (note the need for an AND here), the simplest solution that comes to mind is (indented for readability):

SELECT p.id FROM person p 
  INNER JOIN attribute a1 ON a1.person_id = p.id AND a1.name = 'height' AND a1.value > 60 
  INNER JOIN attribute a2 ON a2.person_id = p.id AND a2.name = 'age' AND a2.value > 40;

Note that I'm joining to the same table twice.

Am I missing a straightforward way to do this without joining repeatedly, or is that pretty much the most efficient way to go about it? Can I repeat the join numerous times without getting into trouble?

I am aware of one hypothetical alternative: adding the individual attributes (height, age, etc.) as columns in the person table. Unfortunately in my application people are constantly adding new attributes on the fly (think machine tags in flickr). It is not feasible to add them all as columns, even if I had privileges to alter the database schema on the fly.

Thanks!

That looks like the most straightforward way to me, and it should also run very fast if your indexes are right. — Ben Lee, Jan 31 '12 at 00:43
Using joins is usually the most efficient way in your type of design (EAV). — ypercubeᵀᴹ, Jan 31 '12 at 00:44
And this, class, is why EAV (entity-attribute-value) is annoying. Well, this and the near-total inability to set constraints. Oh, and indexing is a nightmare. Eh...ok, i think i've diluted my point. — cHao, Jan 31 '12 at 00:44
See this question that benhchmarks various different ways to answer a similar query: [How to filter SQL results in a has-many-through relation](http://stackoverflow.com/questions/7364969/how-to-filter-sql-results-in-a-has-many-through-relation) — ypercubeᵀᴹ, Jan 31 '12 at 00:48
Thanks to ypercube for pointing to that fantastic roundup of tests. Sounds like I'm on the right path. — Tom Boutell, Jan 31 '12 at 04:00
There is a subtle difference though. Those queries have only equality conditions. Your query includes inequality (range) conditions. — ypercubeᵀᴹ, Jan 31 '12 at 21:36
ypercube: true, but since those can be indexed the inner join shouldn't stop winning just on account of that. — Tom Boutell, Feb 02 '12 at 01:10

score 1 · Answer 1 · answered Jun 15 '12 at 15:23

The way you are doing it is not the only simple way to do this - event if it may be the most efficient. Simpler ways to do this would include:

Use union:

 SELECT person_id FROM attribute WHERE name = 'height' AND value > 60 
 union
 SELECT person_id FROM attribute WHERE name = 'age' AND value > 40

Use boolean logic (and eliminate duplicates using the distinct keyword):

SELECT distinct person_id FROM attribute 
   WHERE (name = 'height' AND value > 60) OR (name = 'age' AND value > 40)

You can use either of these as subqueries to fin the names, instead of the IDs, if you wish. If you have an index on (name, value) including the id, these might even be efficient

This is finding persons older than forty OR higher than 60, right? — Moberg, Jan 30 '15 at 07:32
@Moberg yes, but it's easy to construct an AND query instead using boolena logic. (The first method would need a further tweak.) — David Manheim, Feb 13 '15 at 06:03

score 1 · Answer 2 · answered Jan 31 '12 at 20:50

what you have there is probably the most efficient you're going to get with your data structure. although it's still a little bizarre.... unless a person can have multiple heights or ages, it might make more sense to merge those tables.

if nothing else, have a person table and personattribute table that houses some common elements as columns instead of rows. you can still have attribute table for other more obscure attributes that you can share with other objects.

personattribute

personid
height
age
weight
shoesize
waistsize
someotherbodypart_size

sometimes storage overhead can be offset by performance gains. in this case, you're not really wasting much storage either if every "person" has these common attributes.

just my 2 cents.

AS noted, since the attributes can be created later, differentiating between "real" attributes and user-defined ones adds complexity to future changes. — David Manheim, Jun 15 '12 at 15:29

virtualadrian · Answer 3 · 2012-06-24T04:13:52.837

I can't concretely answer this question without setting up a test db, but I think you need to go the other way. Start with your attributes table and join to users. So:

SELECT p.id ,a.name, a.value
FROM attribute a
      INNER JOIN person p ON a.person_id = p.id
WHERE (a.name = 'height' AND a.value > 60)
AND (a.name = 'age' AND a.value > 40)

Something like that ... you may have to tweak it a bit.

EDIT:

As pointed out that won't work ,see I told you I didn't try it. So you have two other choices, using EXISTS or PIVOT. Here is an example of a pivot, although I don't know how this performs in MySQL... if you have millions of rows in your attributes table that may be a problem.

SELECT distinct
u.id
FROM users u 
inner JOIN (

SELECT  
  user_id,  
  GROUP_CONCAT(if(name = 'age', value, NULL)) AS 'age', 
  GROUP_CONCAT(if(name = 'height', value, NULL)) AS 'height'
FROM attributes 
GROUP BY user_id) a on u.id = a.user_id
WHERE a.age > 40
and a.height > 60;

Negative, Red Leader. A single `attribute` row will not simultaneously have `a.name = 'height'` and `a.name = 'age'`. Also, the order in which tables are written does not imply that's the order the query planner with actually execute the join unless you _force_ it with `STRAIGHT_JOIN` (which is usually a bad idea anyway). — Wiseguy, Jun 24 '12 at 02:43

StackOverflow_kankyou · Answer 4 · 2013-02-27T13:36:21.643

On MS-SQL 2000 can be done this way:

SELECT person_id 
FROM attribute 
INNER JOIN (
    SELECT this_id=person_id
    , summ_is=SUM( CASE name  
     WHEN  'att1' THEN 1
     WHEN  'att2' THEN 1
     WHEN  'att3' THEN 1
     WHEN  'att4' THEN 1
     WHEN  'att5' THEN 1
     ELSE 0 END
     FROM attribute
     GROUP BY person_id
 ) tab 
 ON person_id=this_id 
 AND summ_is=5

For any type of comparision:

 SUM( 
     CASE WHEN name > attr1 THEN 1 ELSE 0 END 
    +CASE WHEN name = attr2 THEN 1 ELSE 0 END 
    +CASE WHEN name < attr3 THEN 1 ELSE 0 END 
    +CASE WHEN name != attr4 THEN 1 ELSE 0 END 
    +CASE WHEN name LIKE  '%'+attr5+'%' THEN 1 ELSE 0 END 
)

In MySQL, how can I query efficiently on two or more attributes at the same time without adding them as columns?

4 Answers4

personattribute