2

I have the following code.

INSERT INTO [dbo].[Employees_Dim] ([EmpNo], [EmpName])
SELECT DISTINCT st.[EmpNo], [EmpName]
FROM [dbo].[EEMasterData] st

I want to take two columns out of a master table to create an Employee Dimension. (this will be duplicated for other dimensions coming from this master table).

I want to be able to run the script in an SSIS package that will look at the master data, reference the Employee_Dim and only add new employees based on their Employee Number.

I know I'm missing something just don't know what.

Update: The question I have is, What should I add to the above Script that would Insert only the newly added employees from the EEMasterData table into the EmployeeDim using the EmpNo as the Key.

Brian D. Brubaker
  • 349
  • 2
  • 7
  • 22

1 Answers1

4

... and only add new employees ...

If that is the issue you are facing, select only those entries which do not already exist in your dimension table:

INSERT INTO [dbo].[Employees_Dim] ([EmpNo], [EmpName])
SELECT DISTINCT st.[EmpNo], [EmpName]
FROM [dbo].[EEMasterData] st
WHERE st.[EmpNo] NOT IN (SELECT EmpNo FROM Employees_Dim)

(assuming that EmpNo is unique for each employee)

Heinzi
  • 167,459
  • 57
  • 363
  • 519
  • Correct, the EmpNo is unique to each employee. I will try this, I think this is what I was looking for. Thank you. – Brian D. Brubaker Jul 16 '13 at 20:35
  • 1
    it is faster to do `WHERE NOT EXISTS (SELECT 1 FROM Employees_Dim edim WHERE edim.EmpNo = st.EmpNo)` if you're using SQL Server 2008 and greater (maybe 2005, also). – tommy_o Jul 16 '13 at 20:38
  • 1
    @tommy_o: Are you sure (= have you measured this)? If neither Employees_Dim.EmpNo nor st.EmpNo are nullable, the two versions are semantically equivalent. So why should the query optimizer choose different execution plans for them? – Heinzi Jul 17 '13 at 05:36
  • 1
    @Heinzi: Yes, in our practice we've seen measurable improvements, especially with large data. And while it is possible for it to become equivalent under certain conditions, choosing `NOT EXISTS` will nearly always be more performant if the query is or becomes more complex; values change; etc. For a dim, it probably doesn't matter either way :) http://stackoverflow.com/questions/173041/not-in-vs-not-exists – tommy_o Jul 17 '13 at 17:34
  • 1
    Ah, also I'm assuming st.EmpNo may be NULL. The OP doesn't state that is not nullable, and I don't know what the dbo.EEMasterData table contains. – tommy_o Jul 17 '13 at 17:45
  • I apologize for the lack of info. The EmpNo is Not Null and the EEMasterData houses info like the Employee Name, Job Title, Cost Centers, etc. – Brian D. Brubaker Jul 17 '13 at 19:07