4

I have the following SQL Query.

SELECT em.employeeid, tsi.timestamp
FROM timesheet_temp_import tsi
JOIN employee emp ON emp.employeeid = tsi.credentialnumber
WHERE
tsi.masterentity = 'MASTER' AND
tsi.timestamp NOT IN
(
    SELECT ea.timestamp 
    FROM employee_attendance ea 
    WHERE 
    ea.employeeid = em.employeeid
    AND ea.timestamp =  tsi.timestamp
    AND ea.ismanual = 0
)
GROUP BY em.employeeid, tsi.timestamp

This query compares a import table (with employee time and attendance timestamps).

Sometimes timesheet_temp_import has more than 95,000 rows and my query has to show only the timestamps that are new for the employee. If the timestamp already exists for an employee then I have to exclude it.

The query is working but is taking more than 4 minutes so I want to know if I can improve the NOT IN statement with other that can help me to reduce this time.

abatishchev
  • 98,240
  • 88
  • 296
  • 433
VAAA
  • 14,531
  • 28
  • 130
  • 253
  • 1
    For questions about query performance, please specify the table structure and indexes too, and an indication of the amount of data in those tables. – GolezTrol May 28 '15 at 15:47
  • Can you restrict the main `SELECT` with a `WHERE` clause to get only the latest entries in `timesheet_temp_import`? You could use some kind of bookmark... – ForguesR May 28 '15 at 15:59

5 Answers5

6

Using NOT EXISTS might help you.

SELECT 
    em.employeeid,
    tsi.timestamp
    FROM timesheet_temp_import tsi
    join employee emp ON emp.employeeid = tsi.credentialnumber
    WHERE
    tsi.masterentity = 'MASTER' AND

    NOT EXISTS 
    (
        SELECT NULL  
        FROM employee_attendance ea 
        WHERE 
        ea.employeeid = em.employeeid
        AND ea.timestamp =  tsi.timestamp
        AND ea.ismanual = 0
    )
    GROUP BY 
    em.employeeid,
    tsi.timestamp
Luc M
  • 16,630
  • 26
  • 74
  • 89
  • @LucM Could you explain why this should be used instead of a `LEFT JOIN` and checking if the column is null? – John Odom May 28 '15 at 15:54
  • 2
    @JohnOdom [not in vs not exists vs left join](http://explainextended.com/2009/09/15/not-in-vs-not-exists-vs-left-join-is-null-sql-server/) – Lamak May 28 '15 at 15:57
3

You have this query:

SELECT em.employeeid, tsi.timestamp
FROM timesheet_temp_import tsi JOIN
     employee emp
     ON emp.employeeid = tsi.credentialnumber
WHERE tsi.masterentity = 'MASTER' AND
      tsi.timestamp NOT IN (SELECT ea.timestamp 
                            FROM employee_attendance ea 
                            WHERE ea.employeeid = em.employeeid AND
                                  ea.timestamp =  tsi.timestamp AND
                                  ea.ismanual = 0
                           )
GROUP BY em.employeeid, tsi.timestamp;

Before rewriting the query (as opposed to reformatting it;), I would check the indexes and logic. Is the GROUP BY necessary? That is, are there duplicates produced by the outer query? My guess is no, but I don't know your data.

Second, you want indexes. I think the following indexes: timesheet_temp_import(masterentity, credentialnumber, timestamp), employee(employeeid), employee_attendance(employeeid, timestamp, ismanual).

Third, I would ask if you even have time sheets for non-employees. I think you can get rid of the outer join. So, this might be the query that you want:

SELECT tsi.credentialnumber as employeeid, tsi.timestamp
FROM timesheet_temp_import tsi
WHERE tsi.masterentity = 'MASTER' AND
      tsi.timestamp NOT IN (SELECT ea.timestamp 
                            FROM employee_attendance ea 
                            WHERE ea.employeeid = tsi.credentialnumber AND
                                  ea.timestamp =  tsi.timestamp AND
                                  ea.ismanual = 0
                           );

You might also get a marginal improvement by replacing the NOT IN with NOT EXISTS.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

Another way is to use except

select whatever
from wherever
where somefield in 
(select all potential values of that field
except
select the values you want to exlude)

This is logically equivalent to not in, but faster.

Dan Bracuk
  • 20,699
  • 4
  • 26
  • 43
2

try this and I thin you mean emp

SELECT distinct tsi.credentialnumber, tsi.timestamp
  FROM timesheet_temp_import tsi
  JOIN employee emp 
    ON emp.employeeid = tsi.credentialnumber
   and tsi.masterentity = 'MASTER' 
  left join employee_attendance ea 
    on ea.employeeid = emp.employeeid
   AND ea.timestamp = tsi.timestamp
   AND ea.ismanual = 0
 where ea.employeeid is null

depending on indexs this may be faster

SELECT distinct tsi.credentialnumber, tsi.timestamp
  FROM timesheet_temp_import tsi
  JOIN employee emp 
    ON emp.employeeid = tsi.credentialnumber
   and tsi.masterentity = 'MASTER' 
  left join employee_attendance ea 
    on ea.employeeid = tsi.credentialnumber
   AND ea.timestamp = tsi.timestamp
   AND ea.ismanual = 0
 where ea.employeeid is null
paparazzo
  • 44,497
  • 23
  • 105
  • 176
1

Use LEFT JOIN and WHERE clause for filtering instead of NOT IN:

SELECT 
    em.employeeid,
    tsi.timestamp
    FROM timesheet_temp_import tsi
    join employee emp ON emp.employeeid = tsi.credentialnumber
    left join 
    (
        SELECT ea.timestamp 
        FROM employee_attendance ea 
        WHERE 
        ea.employeeid = em.employeeid
        AND ea.timestamp =  tsi.timestamp
        AND ea.ismanual = 0
    ) t on t.timestamp = tsi.timestamp
    WHERE
    tsi.masterentity = 'MASTER' AND
    t.timestamp is null
    GROUP BY 
    em.employeeid,
    tsi.timestamp
Max
  • 1,784
  • 2
  • 19
  • 26