0

We need to select all segments_id, em_id from the following table mytable that have a segments_id of 5 or 8 but not 1 or 7. To be clearer after comments made below, the result will be a set that includes all rows of segments_id, em_id where em_id has segments_id of 5 or 8 BUT not 1 or 7.

 +-------------+-------+
 | segments_id | em_id |
 +-------------+-------+
 |    1        |   8   |
 |    1        |  17   |
 |    5        |   2   |
 |    5        |   4   |
 |    5        |   5   |
 |    5        |  16   |
 |    5        |  17   |
 |    7        |   4   |
 |    7        |   5   |
 |    7        |   8   |
 |    7        |  16   |
 |    8        |   4   |
 |    8        |   6   |
 |    8        |   8   |
 |    8        |  18   |
 |    18       |   6   |
 |    18       |  99   |
 +-------------+-------+

The result should be:

 +-------------+-------+
 | segments_id | em_id |
 +-------------+-------+
 |    5        |   2   |
 |    8        |   6   |
 |    8        |  16   |
 +-------------+-------+

We need to avoid using IN clause because this can scale to millions of rows.

I understand this will involve a join on itself and/or a subquery but I'm not seeing it. I saw this post Stackoverflow: Selecting rows from a table that have the same value for one field but cannot see the solution.

Community
  • 1
  • 1
Marc
  • 57
  • 10
  • 1
    Why do you think `in` cannot scale to a million of rows? – Ed Heal Jan 16 '16 at 17:00
  • 1
    Are there any others rules for why certain records are being discarded, e.g. `(8, 4)` ? Never mind. I think your expected output is (confusingly) abbreviated. – Tim Biegeleisen Jan 16 '16 at 17:06
  • Hi Ed, truth is I do not know, from experience, but I have seen here some strong opinions [How many values in an “in” clause is too many in a SQL query?](http://stackoverflow.com/questions/8650324/how-many-values-in-an-in-clause-is-too-many-in-a-sql-query) – Marc Jan 16 '16 at 19:03

3 Answers3

1

Using aggregation:

SELECT segments_id,em_id   -- GROUP_CONCAT(segments_id) AS segments_ids
FROM mytable
GROUP BY em_id
HAVING SUM(segments_id IN (8,5)) > 0
   AND SUM(segments_id IN (1,7)) = 0;

SqlFiddleDemo

Output:

╔══════════════╦═══════╗
║ segments_id  ║ em_id ║
╠══════════════╬═══════╣
║           5  ║     2 ║
║           8  ║     6 ║
║           8  ║    18 ║
╚══════════════╩═══════╝
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • I think you want `group by segments_id`. – Gordon Linoff Jan 16 '16 at 17:08
  • As far I understand question, it should be correct. @Marc could you confirm/deny that it is what you want? Please prepare bigger sample, because it can be misleading now. – Lukasz Szozda Jan 16 '16 at 17:11
  • @lad2025, this is working. Can you you remind why its `--` instead of `,` to use `GROUP_CONCAT` in this statement? – Marc Jan 16 '16 at 18:56
  • @Marc I added `GROUP_CONCAT` as comment. The point is that when you use `GROUP BY` any expression is `SELECT` should be wrapped with aggregation function to avoid undeterministic result. `SELECT em_id, GROUP_CONCAT(segments_id) AS segments_ids FROM ...` – Lukasz Szozda Jan 17 '16 at 12:01
1

Not sure what you are asking for here. You tell us you want a select all but the results that you posted do not show all records that satisfy your conditions. Please elaborate on the conditions if you meant to exclude some records where segments_id = 5 or 8 and not 1 or 7.

SELECT segments_id,em_id 
FROM mytable 
WHERE ( segments_id = 5 OR segments_id = 8 ) 
AND segments_id != 1 
AND segments_id != 7;

This will work!

kennedy484
  • 455
  • 1
  • 5
  • 11
  • 1
    This will not work! [Demo](http://sqlfiddle.com/#!9/6a8ca/8/0) comparing to desired resultset. Anyway it is a bit unclear what OP wants – Lukasz Szozda Jan 16 '16 at 17:14
  • Yes this will fulfill the conditions as explained in the caption of the question. "We need to select all segments_id, em_ids from the following table mytable that have a segments_id of 5 or 8 but not 1 or 7" . As stated in my caption above the answer..... – kennedy484 Jan 16 '16 at 17:24
  • But does not match desired output at all – Lukasz Szozda Jan 16 '16 at 17:36
  • being that ( 8 | 16 ) doesn't exist. I think it is safe to say that the desired result is false. – kennedy484 Jan 16 '16 at 18:40
  • Hi Kennedy, thanks for attempting. Your statement is not returning the result. – Marc Jan 16 '16 at 19:00
  • I would love to answer the question correctly =) . Could you elaborate a little more on what you want the result to give you? Also in the result set you have ( 8 | 16 ), and that doesnt exist in the other data set. It is difficult to tell what you want here. =) The edit to your question doesn't elaborate, rather it just repeats what you already said =/ – kennedy484 Jan 18 '16 at 14:24
1

You can use a LEFT JOIN to eliminate all em_id which has a segment_id that is 1 OR 7:

SELECT m.segments_id, m.em_id
FROM myTable as m
LEFT JOIN (SELECT * FROM myTable WHERE (segments_id=1 OR segments_id=7)) as n
ON m.em_id=n.em_id
WHERE n.segments_id IS NULL
GROUP BY m.em_id;

See SQLfiddle here.

If it is 5 AND 8 you want to keep and all the rest should be eliminated, then you can tweak one of your WHERE clauses like this:

SELECT m.segments_id, m.em_id
FROM myTable as m
LEFT JOIN (SELECT * FROM myTable WHERE NOT (segments_id=5 OR segments_id=8)) as n
ON m.em_id=n.em_id
WHERE n.segments_id IS NULL
GROUP BY m.em_id;

SQLfiddle for the second query is here.

And if it is only 5 AND 8 you want to keep and 1 AND 7 you want to eliminate, you can use this (however in this case, the answer given by @lad2025 might be a better choice):

SELECT m.segments_id, m.em_id
FROM myTable as m
LEFT JOIN (SELECT * FROM myTable WHERE (segments_id=1 OR segments_id=7)) as n
ON m.em_id=n.em_id
WHERE (n.segments_id IS NULL AND (m.segments_id=5 OR m.segments_id=8))
GROUP BY m.em_id;

Pls. check third SQLfiddle with an improved set here.

Arman Ozak
  • 2,304
  • 11
  • 11
  • Arman, the choice is between your answer and @lad2025. What is your opinion of both? Could two large `SUM` calls be slower than your join/check `NULL`? – Marc Jan 16 '16 at 19:15
  • In terms of performance, I don't think we can say anything for sure before testing them both. However, I'm pretty sure about the practical use of "not writing all remaining segment_id numbers in the query". Think about the case where there are a thousand ids and you want to eliminate just two. IMHO, writing all of them in a query would be nonsense then. – Arman Ozak Jan 16 '16 at 19:23
  • Hi Arman, actually, the example did not include other rows with different `segments_id` other than 1,5,7,8,. Once I have more segment_ids, your statement is too greedy and returns all the others as well. Can you rewrite to specify exactly which segment_ids should include - as well as exclude? I will attempt to edit my question to add another row with different `segments_id` – Marc Jan 16 '16 at 19:31
  • I have updated the answer. I think it corresponds to your last question. – Arman Ozak Jan 16 '16 at 19:38
  • Hi Arman, I am trying to accept your answer but my rep is below 15. Thanks for the guidance. – Marc Jan 16 '16 at 19:53
  • No problem. I am not here for rep. Take care. – Arman Ozak Jan 16 '16 at 19:57
  • Hi Arman, the third version is exactly the one needed and will also work as the excluded universe may also come from other tables that have rows like `widgets_id, em_id` and we want a set of some segment but none having widgets_id etc. – Marc Jan 16 '16 at 20:05
  • Arman, adding `GROUP_BY` was final piece....see with expanded example to use same logic but excluding from different table [fiddle here](http://sqlfiddle.com/#!9/2e4a4/8) `GROUP_BY` not necessary on same table, but is on subsequent I presume. – Marc Jan 16 '16 at 20:18
  • Oh, now I get it. Well, actually, `GROUP_BY` is neccessary regardless of whether the `JOIN` is made with the same table or a different one. I am updating the answers accordingly. It works before, because your dataset was suitable for that. Nice catch! Thanks. – Arman Ozak Jan 16 '16 at 20:32