2

I am trying to combine two inventory sources with SSIS. The first of which contains inventory information from our new system while the second contains legacy data. I am getting the data from the sources just fine.

Both data sets have the same columns, but I only want to get the results from the second data set if the ItemCode value for that record doesn't exist in the first data set.

Which transform would I need to use to achieve this?

Edit - here is what I have so far in my data flow.

enter image description here

I need to add a transform to the Extract Legacy Item Data source so that it will remove records whose item codes already exist in the Extract New Item Data source.

The two sources are on different servers so I cannot resolve by amending the query. I would also like to avoid running the same query that is run in the Extract New Item Data source.

ryansin
  • 1,735
  • 2
  • 26
  • 49
  • Have you already combined the data sources in the data flow? If not, is there a unique key (or combination of unique keys) that each data set shares? – userfl89 Mar 20 '19 at 17:45
  • I haven't combined them yet - yes the item code will be unique in each dataset, however it may be present in both. I want to only include records from the second dataset if the item code doesn't exist in the first dataset. – ryansin Mar 21 '19 at 08:06
  • @ryansin then you got the answer that you need. using Lookups is faster than using Merge Join transformation, if both sources are on the same SQL Server than you can use a SQL Command as Source – Yahfoufi Mar 21 '19 at 08:59
  • @ryansin anything new? – Yahfoufi Mar 25 '19 at 11:37
  • I haven't had a chance to check this in the last few days – ryansin Mar 27 '19 at 11:51

3 Answers3

0

First of all, concerning that you are using SQL Server Destination, i suggest reading the following answer from the SSIS guru @billinkc:


I will provide different methods to achieve that:

(1) Using Lookup transformation

  1. You should add a Data Flow Task, where you add the second inventory (legacy) as source
  2. Add a lookup transformation where you select the first inventory source as lookup table.
  3. Map the source and lookup table with ItemCode column
  4. In the lookup transformation select Redirect rows to no match output from the drop down list.
  5. Use the Lookup no match output to get the desired rows (not found in the first Inventory source)

You can refer to the link below, it contains a step by step tutorials.

Helpful link

Old Versions of SSIS

If you are using old versions of SSIS, then you will not find the Redirect rows to no match output drop down list. Instead you should go to the Lookup Error output, select Redirect Row option for No Match situation, and use the error output to get the desired rows.


(2) Using Linked Servers

On the Second inventory create a Linked server to be able to connect the the first Server. Now you are be able to use an SQL Command that only select the rows not found in the first source:

SELECT *
FROM Inverntory2
WHERE NOT EXISTS (SELECT 1 FROM <Linked Server>.<database>.<schema>.Inverntory1 Inv1 WHERE Inverntory2.ItemCode = Inv1.ItemCode)

(3) Staging table + MERGE, MERGE JOIN , UNION ALL transformation

On each source SQL command add a fixed value column that contains the source id (1,2), example:

SELECT *, 1 as SourceID FROM Inventory

You can combine both sources in one destination using one of the transformation listed above, then add a second Data flow task to import distinct data from staging table into destination based on the ItemCode column, example:

SELECT * FROM (
       SELECT *, ROW_NUMBER() OVER(PARTITION BY ItemCode ORDER BY SourceID) rn
       FROM StagingTable ) s
Where s.rn = 1

Then it will return all rows from SourceId =1 and the new rows from SourceId = 2

To learn more about Merge, Merge Join and UNION ALL transformation you can refer to one of the following links:


Note: check the answer provided by @userfl89 it contains very detailed information about using Merge Join transformation and it described another approach that can help. Now you have to test which approach fits your needs. Good Luck

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • How do I add my first source as the lookup table? Should I just re-enter the SQL query that I am using to obtain the first data set? – ryansin Mar 21 '19 at 14:20
  • Yes use the same query – Hadi Mar 21 '19 at 14:26
  • Is there any way to do that without having to run the same query again? I would rather only query the data set once if that is possible – ryansin Mar 21 '19 at 14:28
  • I think you misunderstood what i provided. You only must have one oledb source which is the second inventory. And a lookup transformation which is the first inventory. You don't have to write the query twice – Hadi Mar 21 '19 at 14:48
  • Yes I can re-use the source, but I can't see a way to lookup against the first data set without running the same query that was used to extract the first data set. Unless I am missing something I would need to re-query the same data to achieve this? – ryansin Mar 21 '19 at 14:52
  • Ah ok. Why not using a staging table then extract distinct data to the destination – Hadi Mar 21 '19 at 14:56
  • You can use merge join but it is not recommended from performance perspective – Hadi Mar 21 '19 at 14:57
  • 1
    @ryansin tonight i will provide a detailed answer with all possible solutions because i am writing from my phone right now. – Hadi Mar 21 '19 at 15:01
  • Thank you. I've edited my question to provide some more information as well. – ryansin Mar 21 '19 at 15:03
  • @ryansin anything new? – Hadi Apr 08 '19 at 09:45
0

If both Sources type is SQL databases and they are stored on the same Server, you can use an SQL Command as Source to achieve that:

SELECT Inverntory2.*
FROM Inverntory2 LEFT JOIN Inverntory1
     On Inverntory2.ItemCode = Inverntory1.ItemCode
WHERE Inverntory1.ItemCode IS NULL

OR

SELECT *
FROM Inverntory2
WHERE NOT EXISTS (SELECT 1 FROM Inverntory1 WHERE Inverntory2.ItemCode = Inverntory1.ItemCode)
Yahfoufi
  • 2,220
  • 1
  • 22
  • 41
  • Unfortunately they are on different servers so I need some way to eliminate results from the second source after it has been queried based on whether the item code already exists in the first source – ryansin Mar 21 '19 at 14:29
0

An example of this is below. Using a SQL Server Destination will work fine, however this only allows for loading to a local SQL Server instance, something that you may want to consider for the future. While a Lookup typically performs better, Merge Joins can be beneficial in certain circumstances such as when many additional columns are introduced into the Data Flow, as may be done with your data sets. It looks like @Hadi has covered how to do this with a Lookup, so you may want to test both approaches in a non-production environment that mimics prod, then assess the results to determine the better option.

  • Start off by creating a staging table which is an exact clone of one of the tables. Either table will work since they have the same definition. Make sure all columns in the staging allow null values.
  • Add an Execute SQL Task to clear the staging table before the Data Flow Task by either truncating or dropping and then creating the table.
  • Since ItemCode is unique sort on this column in each OLE DB Source. If you aren't already change the Data Access Mode to SQL command in both OLE DB Sources and add an ORDER BY clause for ItemCode. Do this by right-clicking the OLE DB Source and going to Show Advanced Editor > Input and Output Properties > OLE DB Source Output > Output Column > then select ItemCode and set the SortKeyPosition property to 1 (assuming you do ASC source in SQL statement).
  • Next add a Merge Join in the Data Flow Task. This requires both inputs to be sorted, which is why the inputs are now sorted. You can do this either way, but for this example use the OLE DB Source that will only be used when ItemCode does not exist as the merge join left input. Use a left outer join and the ItemCode column as the join key by connecting them via dragging a line from one to the other in the GUI. Add all the columns from the OLE DB Source that you want to use when the same ItemCode is in both data sets (from what I could tell this is Extract New Item Data, please adjust this if it isn't) by checking the check-box next to them in the Merge Join editor. Use an output alias prefix that will help you distingush these, for example X_ItemCode for the matching rows.
  • After the Merge Join add a Conditional Split. This is divide the records based on whether X_ItemCode was found. For the expression of the first output, use the ISNULL function to test if there was a match from the left outer join. For example ISNULL(X_ItemCode) != TRUE indicates that the ItemCode does exists in both data sets. You can call this output Matching Rows. The default output will contain the non-matches. To make it easier to distinguish you can rename the default output Non-Matching Rows.
  • Connect the Matching Rows output to the destination table. In this map only the columns of rows that were matched for the source you want to use when ItemCode exists in both data sets, i.e. the X_ prefixed rows such as X_ItemCode.
  • Add another SQL Server Destination in the Data Flow and connect the output Non-Matching Rows output to this, with all the columns mapped from rows that did not match, the one's without X_ in this example.
  • Back on the Control Flow in the package, add another Data Flow Task after this one. Use the staging table as the OLE DB Source and the destination table as the SQL Server Destination. Sorting isn't necessary here.
userfl89
  • 4,610
  • 1
  • 9
  • 17