4

I have a Data Flow Task that moves a bunch of data from multiple sources to multiple destinations. About 50 in all. The data moved is from one database to another with varying rows and columns in each flow.

While I believe I understand the basic idea behind the Data Flow Task's DefaultBufferMaxRows and DefaultBufferSize as it relates to Rows per Batch and Maximum insert commit size of the Destination, it's not clear to me what happens when there are multiple unrelated source and destination flows.

What I'm wondering is which of the following makes the most sense :

  • Divide out all the source and destination flows into separate Data Flow Tasks
  • Divide them into groups that have roughly the same size and number of rows
  • Leave as is and just make sure to set the properties with enough Buffer Rows and Buffer Size while setting the Rows per batch and Maximum insert commit size to the individual destination

I believe I read some place that it's better to have each source and destination in it's own data flow task, but I am unable to find the link at this time.

Most examples I've been able to locate online seem to always be for one source to one or more destinations, or just one to one.

  • Hello and welcome Brian. Although your question itself is well put and seems interesting to answer for someone who knows the subject matter (not me), the question title seems to be a bit lackluster and might prevent you from getting the answer you deserve. – Robin De Schepper Sep 25 '19 at 18:04
  • Would you by chance have any suggestions on how one might word a question? Possibly "Question about SSIS Data Flow Task properties" ? – Brian Brian Brian Sep 26 '19 at 15:54
  • Maybe just "In SSIS, what are 'Buffer Size', 'Max Rows', 'Rows Per Batch' and 'Insert Commit Size' for?" – Robin De Schepper Sep 26 '19 at 16:05

1 Answers1

6

Let me go from the basis. Data Flow Task is a task, organizing a pipeline of data from Data Source to Data Destination. It is a unique task in SSIS because it runs data manipulation in SSIS itself, all other tasks call external systems to do something with data out of SSIS.
On the relationships between DefaultBufferMaxRows, DefaultBufferSize as it relates to Rows per Batch and Maximum insert commit size of the Destination. There is no direct relation. DefaultBufferMaxRows and DefaultBufferSize are properties of Data Flow pipeline; the pipeline processes rows in batches and these properties controls the processing batch size. These properties control RAM consumption and performance of Data Flow Task.
On other hand, Rows per Batch and Maximum insert commit size are the properties of Data Destination, namely OLE DB Destination in Fast Load mode only; it controls performance of Data Destination itself. You may have a Data Flow with Flat File Destination where you do not have Rows per Batch, but it will definitely have DefaultBufferMaxRows and DefaultBufferSize properties.

Typical usage from my experience:

  • DefaultBufferMaxRows and DefaultBufferSize control batch size of Data Flow pipeline. Tuning it is a tradeoff - bigger batches means less overhead on batch handling i.e. less execution time, but more RAM consumption. More RAM means that you might experience outage of RAM and DFT data buffers will be swapped to Disk.
    In SSIS 2016+ there is a "magical setting" AutoAdjustBufferSize which tells the engine to autogrow the buffer.
    Values for these properties are usually defined at performance tests in QA environment. On development - use the defaults.
  • Rows per Batch and Maximum insert commit size -- control log growth and possibility to rollback all changes. Do not change these unless you really need to do so. Defaults are generally Ok; I changed it rarely on special reason. More on its functions.

On package design:

  • 1 pair of Source-Destination per DFT (Data Flow Task). This is optimal - gives you most of control in terms of tuning and execution order etc. Also you can utilize parallel execution of tasks by SSIS engine. BTW, it simplifies debugging and support.
  • Division in groups. You can group DFT in Sequence groups and define common properties via Expressions-Variables. But - use it if you really need to do so because it complicates your design.
  • All Source-Destination in one DFT. I would recommend against it, complex and error prone.

As a bottom line, keep it simple -- 1 pair of Source-Destination per DFT, and play with your parameters only if have to do so.

Ferdipux
  • 5,116
  • 1
  • 19
  • 33
  • Great post, thanks for taking the time to reply. It looks like I'm going to have a bit of work ahead of me then if I"m going to move all those Source -> Destination into separate DFT's, but it'll probably be worth it. – Brian Brian Brian Sep 27 '19 at 13:29
  • 1
    @BrianBrianBrian you can also upvote the answer apart from accepting it. – Robin De Schepper Sep 28 '19 at 07:36
  • 1
    Upvoted - been using 2016 for a while and just found out about AutoAdjustBufferSize from this answer. Thanks! – Dave Brown Oct 02 '19 at 06:23