5

I've got two tables. One is users, which has fields uid and name. The other table is users_roles, which has fields uid and rid (role id).

I'd like to retrieve a list of users that don't have any of a set of provided roles.

For example:

uid | name
----------
1   | BOB
2   | DAVE
3   | JOHN

USERS_ROLES:

uid | rid
---------
1   | 1
1   | 2
1   | 3
2   | 1
3   | 1

I want to be able to query for just users that don't have a certain set of rids. For instance, a query that excludes rids 2 and 3 should return DAVE and JOHN, or a query that excludes rids (1,3) should return nobody.

fthiella
  • 48,073
  • 15
  • 90
  • 106
Mike
  • 1,625
  • 3
  • 16
  • 19

3 Answers3

13

A query using an anti-join pattern is sometimes the most efficient:

SELECT u.uid
     , u.name
  FROM users u
  LEFT
  JOIN users_roles r
    ON r.uid = u.uid
   AND r.rid IN (2,3)
 WHERE r.uid IS NULL

The anti-join pattern is do a LEFT [outer] JOIN the user_roles table to pull back all the matching rows, AND to get rows from users that don't have a matching row. The "trick" is to exclude all the matching rows with a predicate in the WHERE clause that eliminates all the rows from users that had a match.

An equivalent resultset can be obtained using a NOT EXISTS correlated subquery:

SELECT u.uid
     , u.name
  FROM users u
 WHERE NOT EXISTS 
       ( SELECT 1 
           FROM users_roles r
          WHERE r.uid = u.uid
            AND r.rid IN (2,3)
       )

Another approach is to use a NOT IN, although that is sometimes less efficient. Performance depends on a whole host of factors. It's possible for the optimizer to generate different execution plans for each of these queries.

In any case, for best performance, you'll need an index ... ON users_roles (uid) or ON users_roles (uid,rid).


Followup:

Performance testing on my MySQL 5.1.34 server reveals that an anti-join query is almost twice as fast as equivalent NOT EXISTS and NOT IN queries. (1.091 sec vs. 2.066 sec and 2.020 sec)

-- setup and populate test tables
CREATE TABLE t_users
( uid    INT UNSIGNED NOT NULL PRIMARY KEY
, `name` VARCHAR(50)
) ENGINE=INNODB DEFAULT CHARSET=latin1 ;

CREATE TABLE t_users_roles
( uid INT UNSIGNED NOT NULL
, rid INT UNSIGNED NOT NULL
, PRIMARY KEY (uid,rid)
) ENGINE=INNODB DEFAULT CHARSET=latin1;

ALTER TABLE t_users_roles ADD CONSTRAINT FK_t_users_roles_t_users FOREIGN KEY (uid) REFERENCES t_users (uid);

CREATE INDEX t_users_ix1 ON t_users (uid,`name`);

CREATE INDEX t_users_roles_ix1 ON t_users_roles (rid,uid);

INSERT INTO t_users (uid,`name`)
SELECT d.d*100000+e.d*10000+u.d*1000+h.d*100+t.d*10+o.d+1 AS uid
     , CONCAT('NAME',LPAD(d.d*100000+e.d*10000+u.d*1000+h.d*100+t.d*10+o.d+1,8,'-')) AS `name`
  FROM (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) o
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) h
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) u
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) e
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d 
;

-- 1000000 row(s) affected.

INSERT INTO t_users_roles (uid,rid)
SELECT d.d*100000+e.d*10000+u.d*1000+h.d*100+t.d*10+o.d+1 AS uid
     , r.rid
  FROM (SELECT 1 AS rid UNION ALL SELECT 2 UNION ALL SELECT 3) r
 CROSS
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) o
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) h
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) u
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) e
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
 WHERE (d.d*100000+e.d*10000+u.d*1000+h.d*100+t.d*10+o.d+1) % 100000 <> 0
;
-- 2999970 row(s) affected

OPTIMIZE TABLE t_users;
OPTIMIZE TABLE t_users_roles;

SHOW STATUS LIKE 'Qcache_hits' ;
-- Variable_name  Value    
-- -------------  ---------
-- Qcache_hits    1117342  

SHOW VARIABLES LIKE 'version' ; 
-- Variable_name  Value       
-- -------------  ------------
-- version        5.1.53-log

-- table size from the file system
$ du -sh DATA/test/t_users*.ibd
72M     DATA/test/t_users.ibd
133M    DATA/test/t_users_roles.ibd


-- anti-join query
SELECT SQL_NO_CACHE u.uid
     , u.name
  FROM t_users u
  LEFT
  JOIN t_users_roles r
    ON r.uid = u.uid
   AND r.rid IN (2,3)
 WHERE r.uid IS NULL ;

-- Exec: 1.095 sec
-- Exec: 1.090 sec
-- Exec: 1.091 sec
-- Exec: 1.087 sec
-- Exec: 1.090 sec
-- avg 5 executions: 1.091 sec    


-- not exists query
SELECT SQL_NO_CACHE u.uid
     , u.name
  FROM t_users u
 WHERE NOT EXISTS
       ( SELECT 1
           FROM t_users_roles r
          WHERE r.uid = u.uid
            AND r.rid IN (2,3)
       ) ;

-- Exec: 2.071 sec
-- Exec: 2.066 sec
-- Exec: 2.059 sec
-- Exec: 2.065 sec
-- Exec: 2.070 sec
-- avg 5 executions: 2.066 sec

-- not in query
SELECT SQL_NO_CACHE u.uid
     , u.name
  FROM t_users u
 WHERE u.uid NOT IN
       ( SELECT r.uid
           FROM t_users_roles r
          WHERE r.uid IS NOT NULL
            AND r.rid IN (2,3)
       ) ;

-- Exec: 2.022 sec 
-- Exec: 2.023 sec 
-- Exec: 2.014 sec
-- Exec: 2.026 sec
-- Exec: 2.016 sec
-- avg 5 executions: 2.020 sec

SHOW STATUS LIKE 'Qcache_hits' ;
-- Variable_name  Value    
-- -------------  ---------
-- Qcache_hits    1117342  

EXPLAIN output for three statements:

-- ANTI JOIN
id  select_type         table   type            possible_keys              key          key_len  ref       rows   filtered  Extra                                 
--  ------------------  ------  --------------  -------------------------  -----------  -------  -----  -------  --------  ------------------------------------
 1  SIMPLE              u       index                                      t_users_ix1  57              1000423    100.00  Using index
 1  SIMPLE              r       ref             PRIMARY,t_users_roles_ix1  PRIMARY      4        u.uid        1    100.00  Using where; Using index; Not exists

-- NOT EXISTS
id  select_type         table   type            possible_keys              key          key_len  ref       rows  filtered  Extra                     
--  ------------------  ------  --------------  -------------------------  -----------  -------  -----  -------  --------  --------------------------
 1  PRIMARY             u       index                                      t_users_ix1  57              1000423    100.00  Using where; Using index
 2  DEPENDENT SUBQUERY  r       ref             PRIMARY,t_users_roles_ix1  PRIMARY      4        u.uid        1    100.00  Using where; Using index

-- NOT IN
id  select_type         table   type            possible_keys              key          key_len  ref        rows  filtered  Extra                     
--  ------------------  ------  --------------  -------------------------  -----------  -------  ------  -------  --------  --------------------------
 1  PRIMARY             u       index                                      t_users_ix1  57               1000423    100.00  Using where; Using index
 2  DEPENDENT SUBQUERY  r       index_subquery  PRIMARY,t_users_roles_ix1  PRIMARY      4        func          1    100.00  Using index; Using where
spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • NOT EXISTS is apparently much worse than NOT IN which is exactly the same as the obtuse JOIN statement: http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/ – mattmanser Feb 05 '13 at 15:25
  • Going to the trouble of optimization/performance testing is pretty awesome. Nice work. – Philip Whitehouse Feb 05 '13 at 22:35
  • 1
    @mattmanser: Performance of queries depends on several factors. In my testing, on an idle test server, a query with an anti-join pattern performs nearly twice as fast as an equivalent NOT EXISTS or NOT IN query. – spencer7593 Feb 05 '13 at 22:39
1

You're looking for the NOT IN query:

MySQL "NOT IN" query

SELECT name 
FROM users 
WHERE uid NOT IN (SELECT uid FROM Users_roles WHERE rid = 2 OR rid = 3)
Community
  • 1
  • 1
mattmanser
  • 5,719
  • 3
  • 38
  • 50
  • 2
    Why not double down: `SELECT name FROM users WHERE uid NOT IN (SELECT uid FROM Users_roles WHERE rid not in (2, 3))` :) – Matthew Feb 05 '13 at 15:15
  • I guess I thought something like this should be possible without a subquery...not the case? – Mike Feb 05 '13 at 15:18
  • @Mike It is possible as per JW's answer, but the code is much less elegant. I'm back to SQL Server now so don't avoid them so much, but according to the 3rd answer in that link the query optimizes down to the same plan as a JOIN anyway http://stackoverflow.com/a/6635936/62829 – mattmanser Feb 05 '13 at 15:22
  • @Mike it is, use a join if you prefer not using the sub-query. Although the difference in likely small. For very large datasets you will need to tune based on how the specific vendor (MySQL) handles the execution plan for each. – Matthew Feb 05 '13 at 15:23
  • 1
    It seems like the sub-query should have an extra where clause to only return records where the Users_roles.uid = Users.uid, but I'm guessing this is optimized out and/or not required. – eselk Dec 05 '13 at 03:36
1

The query will give you all the complete list of names with their missing rid.

SELECT  a.*, b.rid
FROM    users a CROSS JOIN
        (SELECT DISTINCT rid FROM users_roles) b
        LEFT JOIN USERS_ROLES c
            ON a.uid = c.uid AND 
                b.rid = c.rid
WHERE   c.rid IS NULL
        -- AND other conditions
John Woo
  • 258,903
  • 69
  • 498
  • 492