0

here's an example of what I'm looking for:

user_id color
1 red
1 yellow
1 blue
2 red
3 red
4 red

I want to pull users who DO NOT have the color red or yellow AT ALL. Which I know is not simply:

select user_id
from table_name
where color not in ('red', 'yellow')

User 1 will still output even though they have a line item for both red and yellow. The single line item of color = blue for user 1 fits the criteria of color not in ('red', 'yellow'), but I want to exclude users that have any line item of red or yellow.

I think something like

with CTE 
(of all users that have bought red or yellow)

select user_id
from table_name
where user_id not in (select * from CTE)

will work... but is that best practice? I'm wondering if there's some way easier solution or function out there

4 Answers4

1

With only this one table, we'd use aggregation. E.g.:

select user_id
from table_name
group by user_id
having max(case when color = 'red' then 1 else 0 end) = 0
   and max(case when color = 'yellow' then 1 else 0 end) = 0);

With a separate user table (which I suppose exists, as there is a user_id in the table) we would typically use a lookup with NOT EXISTS or NOT IN instead. E.g.:

select user_id
from users
where user_id not in (select user_id from table_name where color = 'red')
  and user_id not in (select user_id from table_name where color = 'yellow');
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
0

Following your idea to use IN, you could do following:

SELECT user_id
FROM table_name
WHERE user_id NOT IN (SELECT user_id FROM table_name 
WHERE color IN ('red', 'yellow'));

You could also use NOT EXISTS which could maybe be faster, but it's less readable. And you could also add a DISTINCT if you want to make sure to do not select identic user id's multiple times. The query with DISTINCT would be slower, so it should only be used if really necessary.

Jonas Metzler
  • 4,517
  • 1
  • 5
  • 17
0

This should work for your expected result set:

select distinct user_id, colour
from table_name
where user_id not in (select user_id
                      from table_name
                      where colour in ('red', 'yellow'))

You can check an example solution here: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=775bfc1511fede58b469e4da6b1779d1

karthik_ghorpade
  • 374
  • 1
  • 10
-1

I suggest to use NOT EXISTS because it has a better performance than NOT IN

SELECT t1.user_id
FROM table_name t1
WHERE not exists (SELECT user_id FROM table_name t2 
WHERE color IN ('red', 'yellow') and t1.user_id=t2.user_id);
burnsi
  • 6,194
  • 13
  • 17
  • 27
Lev Gelman
  • 177
  • 8
  • 1
    How do you know this? You are using an IN clause, too. If you think this is too slow, you must also write "color = 'red' OR color = 'yellow'" instead. – Jonas Metzler Jun 08 '22 at 05:48
  • NOT IN specific considered to be slowly. The aliases here is mandatory, it's used for join between main statement and not exists statement – Lev Gelman Jun 08 '22 at 05:49
  • 1
    This is wrong. `NOT IN` is not considered slower than `NOT EXISTS`. Whether it is or not depends on the DBMS and the query. You don't even know which DBMS the OP uses. Your query returns user IDs multifold by the way (e.g. when a user is associated only blue and green). – Thorsten Kettner Jun 08 '22 at 05:53
  • https://stackoverflow.com/questions/173041/not-in-vs-not-exists – Lev Gelman Jun 08 '22 at 05:55
  • Thank you for this link. So, you are saying that If the OP happens to use SQL Server, and if they happen to even use the same version as the answering person did when answering the question in 2012, then `NOT EXISTS` is faster than `NOT IN`? – Thorsten Kettner Jun 08 '22 at 05:59
  • As already said, using OR is also faster than using IN, but this thing you didn't change. It's totally fine to propose the usage of NOT EXISTS, but if you complain about a possible bad performance in your answer, your answer should not contain further performance issues. – Jonas Metzler Jun 08 '22 at 06:00
  • I say that there are more cases that NOT EXISTS have better performance than NOT IN, of course in real prod environment you should check with some query analyzer tool – Lev Gelman Jun 08 '22 at 06:01
  • @Jonas Metzler: Who gave you the idea that `OR` is faster than `IN`? It usually isn't. All that Lev Gelman said is that they consider `NOT EXISTS` faster than `NOT IN`. – Thorsten Kettner Jun 08 '22 at 06:02
  • It's of course not generally faster, same like NOT EXISTS is not generally faster than NOT IN. Lots of execution plans have shown that queries with OR are executed a bit faster when only few conditions are checked (like in this example two conditions). IN will get much faster for very many conditions. Overall, I would be more critical and carefully when telling something about performance and wanted to point out that. In this case, I would not write "has a better performance" like Lev did, but write "could have" or at least "will likely have". – Jonas Metzler Jun 08 '22 at 06:07