2

I am working on a project to process three files...two CSVs and one XML. These files are being moved from a file share to a SQL Server database table using BizTalk. The xml file is being transformed into the same flat file format as the two flat files using a C# component in SSIS. Then, these flat files are processed by SSIS packages. There is a lot of business logic in the SSIS transformations. The SSIS packages also do several look-ups using linked servers. All lookups and transforms are done on a row-by-row basis (which is slow). Also, any errors that occur are put in a separate database table depending on the business object that causes the error (i.e. BusObj1_error, BusObj2_error, BusObj3_error). Basically, I was hoping someone could suggest a better architecture that would improve performance, allow scalability and flexibility, and allow many developers works as a team on the same pieces of functionality. E.g. - Put validation rules in a db rather than hardcoded into SSIS.
- Instead of using different error tables, use a single error table with a errorTypeId FK to an ErrorType table. - Migrate all transformations from SSIS C# so that multiple developers can work on different Business logic classes at same time.

Thanks

Brendan
  • 45
  • 3

3 Answers3

3

I don't know anything about BizTalk. So, I don't know what it can or cannot do. With respect to SSIS, my suggestions would be as following:

  1. If you are having lots of business logic code embedded within Script Task/Script Component. I would suggest that you create a C# class library project and register the DLL in GAC to reference it within SSIS package. Make the DLL generic so that very little code is required to implement it within SSIS package. This way you can also reuse the logic.

  2. Regarding the Lookup using Linked server, I hope that I am understanding this correctly. Are you saying that you have linked servers pointing to other instances and your lookup is pointing against these linked servers? If the databases are on SQL Server instances, why not create a connection string directly pointing to the database and do the lookup? This would speed up things a lot.

  3. Are you using lots of OLE DB Command transformations in your package? If it is a direct insert, try to use OLE DB Destination in order to speed up things.

  4. Have a look at this SO answer How can I load a large flat file into a database table using SSIS? where I have explained how you can split up row transformation to multiple destination to speed up the process.

  5. Since you are writing errors custom designed database, I think it is good idea to have a header-detail table structure to store the error messages instead of having to create multiple error tables. I prefer to keep it more generic so the logic can accommodate future scope changes.

Hope that gives you some idea.

Edit 1:

I had a feeling that you are going to say that the lookup database is DB2. Still, Linked server might slow down the package. Lookup transformation does support querying directly against DB2 database along with SQL Server and Oracle (MSDN). You could avoid Lookup through linked server.

Once the C# class library project is compiled to a dll, you can reference it from VB.NET as well. As long as your DLL is targeting towards .NET Framework 2.0, I think you should be good in referencing it within SSIS 2005.

Also, there are other ways to improve depending on what logic is embedded in the .NET code. Is that logic something that can be written in a stored procedure? In that case, load the data into a staging table, run a stored procedure that applies the logic on the stage data and then transfer the data to your destination table.

Community
  • 1
  • 1
  • Thank for the reply Siva. RE point 1; yes, that is something that has been suggested in the team where I work, however, it was suggested that this would be very slow (would are dealing with approx. 20,000 rows per flat file). Have you heard of this happening? I use C# and SSIS 2005. The SSIS 2005 script task supports only VB.Net. Will I have an issue calling a C# assembly from SSIS 2005? RE point 3, the linked server queries are being run again a DB2 database. Anything you can suggest as an improvement on that? Points 4 and are helpful too! – Brendan Aug 03 '11 at 16:22
0

If the nature of this data is real-time then you should be doing as much as possible in BizTalk.

For example, XSL transforms can be applied on the fly in pipelines meaning ultra fast, stream-based processing.

You can ramp up the throughput easily by scaling horizontally, allowing you to process many 10's or even 100's of items concurrently.

tom redfern
  • 30,562
  • 14
  • 91
  • 126
  • Thanks Hugh. The data is not realtime. It is received in bulk (20,000 approx rows) on a montly basis. – Brendan Aug 04 '11 at 10:45
0

A couple of suggestions...

If you are using BizTalk simply to move files from a file share to a SQL Server database table, then I suggest you take BizTalk out of the picture altogether and use SSIS only. If you don't require any of the functionality that BizTalk provides, then don't use BizTalk. Keep it simple.

Additionally, you will get the best possible performance using BCP. Prepare your data using SSIS, then load it on the target database using BCP.

Fabio
  • 730
  • 3
  • 10