0

I hope the title is somewhat clear.

Situation: Table with measurements, around 350.000.000 records are in here. Table with sensor id's, around 300 records are in here.

Now when I want specific measurements of a sensor, I take the sensor id and query the measurement table with that sensor id (and specific data or whatever). These queries run very fast (index on sensor id and some other columns).

However, it has been a long running project and I have no idea how much data is "old" (measurements that I cannot access, because the sensor id is not in my sensor table). Lets say:

measurements table
sensor id
1
2
3
4
5

sensor table
1
4
5

I want to select all id's from the measurement table, that are not in the sensor table. I normally do this with the following query:

SELECT sensor_id
FROM measurement_table
WHERE sensor_id NOT IN (
  SELECT sensor_id
  FROM sensor_table
  WHERE sensor_id IS NOT NULL)
)
GROUP BY sensor_id

On a table that is 'small', say a couple thousands or even a million records, this is doable. However, when I run this query on my measurements table, it takes too long (cancelled it after nearly an hour).

Is there a way for me to find out what sensor id's are in the measurement table but not in the sensor table?

Edit:

SELECT COUNT(*)
FROM measurements_table
WHERE sensor_id IN (
    SELECT sensor_id
    FROM sensor_table
)

Does work (no group, count), gives:

334214244

Total in table:

337225071

So I want to find those 3010827 records by ID...

Tjab
  • 368
  • 1
  • 4
  • 18

2 Answers2

4
SELECT m.sensor_id
FROM measurement_table m
LEFT JOIN sensor_table s ON m.sensor_id = s.sensor_id
WHERE s.sensor_id IS NULL

Perhaps you need to do SELECT DISTINCT.

jarlh
  • 42,561
  • 8
  • 45
  • 63
2

Via left join:

 SELECT t1.sensor_id
  FROM measurement_table t1
 LEFT JOIN sensor_table t2 
   ON t1.sensor_id = t2.sensor_id
 WHERE t2.sensor_id IS NULL

Via exists:

Select t1.sensor_id
 From measurement_table t1
where not exists(
select 1 from sensor_table t2 where t2.sensor_id=t1.sensor_id
)

Never use NOT IN

the left join and exists could do this things but we can't say which is faster. it will depends on the index and other stuff in your table you could try these two and pick a faster one

  • 3
    NOT IN is preferred for an explicit set of values, the problems occur with subqueries. An explanation of the risks of NOT IN might be better than a hard ban. – reaanb Feb 02 '16 at 09:56
  • 2
    @reaanb yeap you are right. check this http://stackoverflow.com/questions/173041/not-in-vs-not-exists. in this question, it explain quite well about not in. – Raffaello.D.Huke Feb 02 '16 at 09:59
  • Thanks a lot all! The NOT EXISTS version took 6:01 to execute and gave me the result I wanted :P – Tjab Feb 02 '16 at 10:11
  • @Tjab, just curious, did you try the LEFT JOIN version too? Which time? – jarlh Feb 02 '16 at 10:15
  • @jarlh, no I did not try that one. I am a bit afraid to use it too now, since the heavy query (NOT IN) took so long and kinda messed up the database server (at least, we think). Might try it tomorrow morning before the work day starts ;) – Tjab Feb 03 '16 at 10:51
  • @jarlh dude,you have to call Tjab otherwise the SO will only notes me that i have a new comment. – Raffaello.D.Huke Feb 04 '16 at 01:31