4

I am currently transferring a large amount of records from one table to another, summarizing in the process. So, I have a SQL in this general format:

INSERT INTO TargetTable
   (Col1,
   Col2,
   ...
   ColX)
   Tot
   )
SELECT
   Col1,
   Col2,
   ...
   ColX
   SUM(TOT)
FROM 
   SourceTable
GROUP BY
   Col1,
   Col2,
   ...
   ColX

Is there any performance advantage of moving this SQL into an SSIS task when transferring records from one table to another using a SQL SELECT as a source? For example, is logging turned off?

Secondary question: Are there any tactics that I could use to ensure a maximum transfer rate? For example, removing indexes from the Target table before inserting, locking the table, etc?

Chad
  • 23,658
  • 51
  • 191
  • 321

2 Answers2

3

In my experience (and, bear in mind that it's been a year and change since I've done this), the only advantage you'd get from SSIS is its ability to make use of the bulk insert task. This adds an additional step, requiring you to export your source data to a flat file before you begin the import process.

Alternatively, if you stick with a SQL statement, the section in this article titled Using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging provides the following suggestions:

You can use INSERT INTO SELECT FROM to efficiently transfer a large number of rows from one table, such as a staging table, to another table with minimal logging. Minimal logging can improve the performance of the statement and reduce the possibility of the operation filling the available transaction log space during the transaction.

Minimal logging for this statement has the following requirements:

  • The recovery model of the database is set to simple or bulk-logged.
  • The target table is an empty or nonempty heap.
  • The target table is not used in replication.
  • The TABLOCK hint is specified for the target table.

I personally dislike SSIS packages for a particular reason: I have never had a DBA who was dedicated to maintaining them. The data import projects I worked on required a lot of fiddling, as the source data wasn't clean (which I assume won't be a problem for you), so I had many packages that worked just fine in a testing environment with a limited data sample that crashed immediately when deployed into production, which made the process a pain in the neck to deal with.

This is just my opinion, but I would say that unless you or someone else you work with focuses on SSIS packages as a part of database maintenance, then it's easier to maintain and document a process that lives inside a stored procedure.

jwheron
  • 2,553
  • 2
  • 30
  • 40
  • Why is Bulk Insert so fast? I thought that it was because the insert was not logged. Therefore, I wonder whether a table-to-table insert can also be not logged. It doesn't seem to me that having to process the data as a flat file should give you speed benefits that are not found when pulling from a table source. – Chad Oct 05 '11 at 17:15
  • That's a good question... When I wrote my SSIS packages at my old job, our DBA told me that the advantage of bulk insert was that it gave you the chance to set a batch size, which sped up performance compared to an insert statement. However, the MSDN documentation I've read about batches doesn't seem to indicate any batch-related performance limitation in insert statements. Otherwise, the bulk insert task could be faster if your source data is on a different machine, forcing SQL to pull it across the network. – jwheron Oct 05 '11 at 17:26
2

Set logging as simple. Set the log size high enough to handle the insert. Are others on the sytems? A tablock will help the insert - TargetTable with (tablock). If you have a clustered index on TargetTable order the data that way in the select. If you can accept dirty read SourceTable with (nolock). If you are inserting more than 100,000 records you might want to break up the insert using a where.

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • If you are aggregating as shown in the sample SQL, I don't think that I can limit the inserts to 100K records at a time. The only way that I can see limiting the records is by targeting a subset of the columns that I am grouping by and this seems unpredictable and possibly forces me to hard code expected group by combinations. – Chad Oct 05 '11 at 13:55
  • How do you set logging as "simple?" – Chad Oct 05 '11 at 15:11
  • In SQL Management Studio it is Properties, Options, Recovery Model to set the logging. 100K at a time is not critical. You just need a bigger transaction log for more. Order by clustered index is a big factor. – paparazzo Oct 05 '11 at 15:19