1

We are currently having an OLTP sql server 2005 database for our project. We are planning to build a separate reporting database(de-normalized) so that we can take the load off from our OLTP DB. I'm not quite sure which is the best approach to sync these databases. We are not looking for a real-time system though. Is SSIS a good option? I'm completely new to SSIS, so not sure about the feasibility. Kindly provide your inputs.

Fka
  • 11
  • 1
  • Other than a "Yes" for SSIS in general, anything more specific, you will need to be more specific as to what your requirements / constraints etc are? – Jagmag Sep 30 '10 at 14:34
  • Thanks for your comment.As I mentioned, I was looking for pointers for the best approach to refresh the reporting database. I am looking for a delta load(changes since the last load) to sync the databases. I haven't started implementing it, so don't have any SSIS specific questions at this point of time. – Fka Sep 30 '10 at 14:41

2 Answers2

0

Everyone has there own opinion of SSIS. But I have used it for years for datamarts and my current environment which is a full BI installation. I personally love its capabilities to move data and it still is holding the world record for moving 1.13 terabytes in under 30 minutes.

As for setup we use log shipping from our transactional DB to populate a 2nd box. Then use SSIS to de-normalize and warehouse the data. The community for SSIS is also very large and there are tons of free training and helpful resources online.

user404463
  • 136
  • 6
0

We build our data warehouse using SSIS from which we run reports. Its a big learning curve and the errors it throws aren't particularly useful, and it helps to be good at SQL, rather than treating it as a 'row by row transfer' - what I mean is you should be creating set based queries in sql command tasks rather than using lots of SSIS component and dataflow tasks.

Understand that every warehouse is difference and you need to decide how to do it best. This link may give you some good idea's.

How we implement ours (we have a postgres backend and use PGNP provider, and making use of linked servers could make your life easier ):

First of all you need to have a time-stamp column in each table so you can when it was last changed.

Then write a query that selects the data that has changed since you last ran the package (using an audit table would help) and get that data into a staging table. We run this as a dataflow task as (using postgres) we don't have any other choice, although you may be able to make use of a normal reference to another database (dbname.schemaname.tablename or somthing like that) or use a linked server query. Either way the idea is the same. You end up with data that has change since your query.

We then update (based on id) the data that already exists then insert the new data (by left joining the table to find out what doesn't already exist in the current warehouse).

So now we have one denormalised table that show in this case jobs per day. From this we calculate other tables based on aggregated values from this one.

Hope that helps, here are some good links that I found useful:

Choosing .Net or SSIS

SSIS Talk

Package Configurations

Improving the Performance of the Data Flow

Trnsformations

Custom Logging / Good Blog

Community
  • 1
  • 1
Mr Shoubs
  • 14,629
  • 17
  • 68
  • 107