How to select distinct rows where 2 columns should match on multiple rows?

Question

I have a table like this :

id | user_id | param_id | param_value  
1      1          44          google
2      1          45         adTest
3      1          46         Campaign
4      1          47          null
5      1          48          null
6      2          44          google
7      2          45         adAnotherTest
8      2          46         Campaign2
9      2          47         null
10     2          48         null

I want to fetch all the user_ids where (param_id = 44 AND param_value=google) AND (param_id= 45 AND param_value = adTest) . So the above where clause should give only user_id = 1 and not user_id = 2 . They both have google at param_id 44 but only user 1 has param_value adTest at param_id = 45 .

The problem is the n the future more params could be added . I need to find a dynamic query . Here what i have tryed :

SELECT DISTINCT up.user_id FROM user_params AS up

                    LEFT JOIN user_params AS upp ON up.id = upp.id

                    WHERE up.param_id IN (?,?) 

                    AND upp.param_value IN (?,?)

score 3 · Accepted Answer · answered Oct 11 '13 at 20:26

3

SELECT DISTINCT up.user_id 
FROM user_params AS up
LEFT JOIN user_params AS upp ON up.id = upp.id
group by up.user_id
having sum(param_id = 44 AND param_value = 'google') >= 1
and sum(param_id = 45 AND param_value = 'adTest') >= 1

answered Oct 11 '13 at 20:26

juergen d

201,996
37
293
362

Ty for the fast answer , briliant – Geo C. Oct 11 '13 at 20:38
Nice +1 and posted an upgrade based on your answer – Raymond Nijland Oct 11 '13 at 21:53

score 2 · Answer 2 · edited May 23 '17 at 11:50

2

Another way:

SELECT  -- DISTINCT 
    up1.user_id 
FROM 
    user_params AS up1
  JOIN
    user_params AS up2 
      ON up1.user_id = up2.user_id
WHERE
    up1.param_id = 44 AND up1.param_value = 'google'
  AND 
    up2.param_id = 45 AND up2.param_value = 'adTest' ;

You do not need the DISTINCT, if there is a UNIQUE constraint on (user_id, param_id)

For efficiency, add an index on (param_id, param_value, user_id)

The problem you are dealing with is called "Relational Division" and there is a great answer by @Erwin Brandstetter here: How to filter SQL results in a has-many-through relation, with a lot of ways to write such a query, along with performance tests.

The tests were done in Postgres so some of the queries do not even run in MySQL but at least half of them do run and efficiency would be similar in many of them.

edited May 23 '17 at 11:50

Community

1
1

answered Oct 12 '13 at 16:23

ypercubeᵀᴹ

113,259
19
174
235

Distinct was removed :) , ty for the notice . But you answer is not that efficient at all because the query is dynamic like i said in the question . If i add more and more joins will not be a good solution . The accepted solution is the best for a dynamic query (many param_id param value pairs to be matched) – Geo C. Oct 12 '13 at 22:00
It may not be good because you have to build a different query every time, so it requires a dynamic solution. But have you really tested efficiency or you are just guessing? My estimate is that the more the joins, the more efficient this query gets. – ypercubeᵀᴹ Oct 12 '13 at 22:03
When was a query with 12++ joins efficient ? :) – Geo C. Oct 12 '13 at 22:05
See this (7-join): **[Subqueries run very fast individually, but when joined are very slow](http://dba.stackexchange.com/questions/31828/subqueries-run-very-fast-individually-but-when-joined-are-very-slow/31836#31836)**. From 400 seconds down to 0.0318 seconds, with proper indexing. – ypercubeᵀᴹ Oct 12 '13 at 22:11
When you join the same table , isn't the mysql engine reading all the data all over again ? Isn't it better to read it only once or fewer times if possible ? – Geo C. Oct 12 '13 at 22:22
That's why I suggested that index. It will only have to read from the index (and only a small part of it, for every one of the joins.) – ypercubeᵀᴹ Oct 12 '13 at 22:24
But still won't be an indexed 1 time read be better than 12x times indexed read ? :) The index is added btw :) – Geo C. Oct 12 '13 at 22:26
Yes, but the other queries will have to read (much) larger parts of the index and also have to do the GROUP BY and aggregate calculations. This query will do an intersection of the (2 or more) `user_id` lists found by the index seeks. It really depends on the distribution of your data and for some cases the GROUP BY solution will be better and in other cases the JOIN method. It doesn't hurt to test in your environment with your data - and choose the best for you. – ypercubeᵀᴹ Oct 12 '13 at 22:28
I really can't say about mysql's performance regarding group by or aggregate functions . But i think the above query can be edited by doing simple (param_id = 44 AND param_value = 'google') which should return true or false . No agregate needed – Geo C. Oct 12 '13 at 22:44
Anyway the problem was not the performance of the query :) Ty for the help and for the link . I will read it to view some real examples and benchmarks – Geo C. Oct 12 '13 at 22:47
Which one do you mean can be edited? Both the other answers have `GROUP BY` and `HAVING SUM(..) `. I don't see how they can be edited. – ypercubeᵀᴹ Oct 12 '13 at 22:47
sum(true and true)>=1 is the same as (true and true) :) – Geo C. Oct 12 '13 at 22:48

Raymond Nijland · Answer 3 · 2013-10-11T21:58:55.450

If you want to optimize this should give the same results without the need an LEFT JOIN table scans (thanks to juergen d for having part)

SELECT
 user_id

FROM 
 user_params

WHERE
  param_id IN(44, 45)
 AND
  param_value IN('google', 'adTest')

GROUP BY
 user_id 

HAVING 
    sum(param_id = 44 AND param_value = 'google') >= 1
  AND
    sum(param_id = 45 AND param_value = 'adTest') >= 1
;

see http://sqlfiddle.com/#!2/17b65/4 for demo

How to select distinct rows where 2 columns should match on multiple rows?

3 Answers3