7

Problem: currently we have numerous stored procedures (very long up to 10,000 lines) which were written by various developers for various requirements in last 10 years. It has become hard now to manage those complex/long stored procedures (with no proper documentation).

We plan to move those stored procedure into SSIS ETL package.

Has anybody done this is past? If yes, what approach should one take.

Appreciate if anybody could provide advise on approach to convert stored procedure into SSIS ETL Packages.

Thanks

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Lokesh Sharma
  • 71
  • 1
  • 2
  • What's your ultimate goal in converting the procedures to SSIS packages? Are the stored procedures for ETL'ing? – Will A May 28 '11 at 17:54
  • Yes stored procedure are mainly doing ETL operation and moving data from 1 sql server to another sql server. – Lokesh Sharma May 28 '11 at 17:55
  • Looked at few data integration options like: SSIS, Service Broker, StreamInsight.... Struggling how to approach converting 10,000 lines of stored procedure into multiple manageable packages – Lokesh Sharma May 28 '11 at 17:56
  • 1
    Create separate packages for separate data source from which you're importing data - depending on the number of tables brought in from each server, you may find that creating packages for different databases (i.e. not just per server) is useful. – Will A May 28 '11 at 18:09

2 Answers2

6

I've done this before, and what worked well for my team was to refactor incrementally, starting with the original source, and then iterate the refactoring effort.

The first step was to attempt to modularize the stored procedure logic into Execute SQL tasks that we chained together. Each task was tested and approved, then we'd integrate and ensure that the new process matched the results of the legacy procedures.

After this point, we could divide the individual Execute SQL tasks across the team, and load-balance the analysis of whether we could further refactor the SQL within the Execute SQL tasks to native SSIS tasks.

Each refactoring was individually unit tested and then integration tested to ensure that the overall process output still behaved like the legacy procedures.

arcain
  • 14,920
  • 6
  • 55
  • 75
  • Thanks arcain for suggestions. Queries - Lets say we modularize sp into execute SQL Task and then chained together. What benefit would ETL provide then if I'm not able to further refactor them into native SSIS task. this way what I have done is just taken sql statements from stored procedure and put them in execute SQL task. – Lokesh Sharma May 28 '11 at 19:46
  • @Lokesh That's just the first step. Upon breaking the monolithic proc into tasks, it almost always (for me, at least) became apparent that a task (or series of tasks) could be translated to native SSIS operations. If you can't further reduce a SQL Task to a SSIS task, then you're done (at that time.) My philosophy [aligns with Spolsky](http://www.joelonsoftware.com/articles/fog0000000069.html)'s, in that your legacy code works, and refactoring it rather than rewriting is a good practice. This approach lets you see results pretty quickly, and emphasizes testing the parts and the whole. – arcain May 28 '11 at 21:17
4

I would suggest the following steps:

  1. Analyze the stored procedures to identify the list of sources and destinations. For example: If the stored procedure dbo.TransferOrders moves data from table dbo.Order to dbo.OrderHistory. Then your source will be dbo.Order and destination will be dbo.OrderHistory.

  2. After you list out the sources and destinations, try to group the stored procedures according to your preference either by source/destination.

  3. Try to find out if there are any data transformations happening within the stored procedures. There are good data transformation tasks available within SSIS. You can evaluate and move some of those functionalities from stored procedures to SSIS. Since SSIS is a workflow kind of tool, I feel that it is easier to understand what is going inside the package than having to scroll through many lines of code to understand the functionality. But, that's just me. Preferences differ from person to person.

  4. Try to identify the dependencies within stored procedures and prepare a hierarchy. This will help in placing the tasks inside the package in appropriate order.

  5. If you have table named dbo.Table1 populating 5 different tables. I would recommend having them in a single package. Even if this data population being carried out by 5 different stored procedures, you don't need to go for 5 packages. Still, this again depends on your business scenario.

  6. SSIS project solution can have multiple packages within them and re-use data sources. You can use Execute SQL task available on the Control Flow task to run your existing queries but I would recommend that you also take a look at some of the nice transformation tasks available in SSIS. I have used them in my project and they function well for ETL operations.

These steps can be done by looking into one stored procedure at a time. You don't have to go through all of them at once.

Please have a look at some of the examples that I have given in other Stack Overflow questions. These should help you give an idea of what you can achieve with SSIS.

Copying data from one SQL table to another

Logging feature available in SSIS

Loading a flat file with 1 million rows into SQL tables using SSIS

Hope that helps.

Community
  • 1
  • 1