1

I am in the process of extending an SSIS package, which takes in data from a text file, 600,000 lines of data or so, modifies some of the values in each line based on a set of business rules and persists the data to a database, database B. I am adding in some reference data validation, which needs to be performed on each row before writing the data to database B. The reference data is stored in another database, database A.

The reference data in database A is stored in seven different tables; each tables only has 4 or 5 columns of type varchar. Six of the tables contain < 1 million records and the seventh has 10+ million rows. I don't want to keep hammering the database for each line in the file and I just want to get some feedback on my proposed approach and ideas on how best to manage the largest table.

The reference data checks will need to be performed in the script component, which acts as a source in the data flow. It has an ado.net connection. On pre-execute, I am going to retrieve the reference data from database 'A', the tables which have < 1 million rows, using the ado.net connection, loop through them all using a sqldatareader, convert them to .Net objects; one for each table and add them to a dictionary.

As I process each line in the file, I can use the dictionaries to perform the reference data validation. Is this a good approach? Anybody got any ideas on how best to manage the largest table?

user1698316
  • 59
  • 2
  • 11
  • https://stackoverflow.com/questions/43746258/query-a-database-based-on-result-of-query-from-another-database/43988356#43988356 – Hadi Jun 11 '17 at 09:32
  • Thanks Hadi. The second method fits best with the existing solution. I can use my the package ado.net connection, with RetainSameConnection = true and my queries are very simple but would going back to the reference data database, for every row, not result in a serious performance hit? Also, would it be more performant to use ado.net for these queries or something like EF or dapper? – user1698316 Jun 11 '17 at 12:03
  • i think that using Lookups can be more efficient than scripts, But if you prefer this way you can use Linq for best performance – Hadi Jun 11 '17 at 12:07
  • No, i am talking on the first method – Hadi Jun 11 '17 at 14:36
  • Ok, I see. That's no use to me. I am working with a package where the data is read from the source inside a script component and persisted to it's final destination in the same script component, so that won't work for me. – user1698316 Jun 11 '17 at 14:43

0 Answers0