1

Need to loading a flat file with an SSIS Package executed in a scheduled job in SQL Server 2016 but it's taking TOO MUCH TIME (like 2/3 hours) just to load data in source then it’s need extra (2/3 hours) time for sort and filter then need similar time to load data in target, the file just has like million rows and it’s not less than 3 GB file approximately. This is driving me crazy, because is affecting the performance of my server.

SSIS package: -My package is just a Data Flow Task that has a Flat File Source and an OLE DB Destination, that’s all -The Data Access Mode is set to FAST LOAD. -Just have 1 indexes in the table. My destination table has 32 columns

Input file: Input text file has more than 32 columns, surrogate key data may not unique , referenced columns date may not unique , Need to filter them.

Face two problems one is SSIS FlatFile-Source take huge time to load date another one is sort and filter. What to do?

enter image description here

Hadi
  • 36,233
  • 13
  • 65
  • 124
shamim
  • 6,640
  • 20
  • 85
  • 151

2 Answers2

1

If you want it to run fast use this pattern:

  1. Load the data exactly as-is into a staging table
  2. Optionally add indexes to the staging table afterwards
  3. Use SQL to perform whatever processing you need (i.e. SELECT DISTINCT, GROUP BY into the final table)

You can do this kind of thing in SSIS but you need to tune it properly etc. it's just easier to do it inside a database which is already well optimised for this

Nick.Mc
  • 18,304
  • 6
  • 61
  • 91
  • the problem is data is too big, if I am going to import the whole data, it is going to take time plus it is going to dramatically increase the database size. – shamim Dec 10 '17 at 08:09
  • Did you try it? Is it worth the cost of another disk drive to improve your load by 100%? – Nick.Mc Dec 10 '17 at 12:54
  • Seems like you already marked an answer. Which one of these options (of many) helped you with your problem? – Nick.Mc Dec 10 '17 at 12:55
  • @Nick.McDermaid i think that `Sort Component` is not recommended when data can be sorted and filtered in SQL server. Also you're right sorting in a staging table is better than deleting duplicate from the destination table *(as i suggested)* so i think that your answer is more powerful than mine even if my suggestions also can give better performance than the OP method. +1 – Hadi Dec 10 '17 at 15:39
  • I hope this actually helped. If you have a 3Gb file, a 3Gb staging table and a 2Gb final table (I assume removing duplicates will reduce it's size) then that's a max disk space of 8Gb. I think my iPhone has enough space to do that. – Nick.Mc Dec 10 '17 at 22:35
0

Some Suggestions

1. DROP and Recreate indexes

Add 2 Execute SQL Task; before and after the Data Flow Task, the first drop the index and the second recreate it after that the Data Flow Task is executed successfully

2. Adjust the buffer size

You can read more about buffer sizing in the following Technet article

3. Remove duplicates in SQL server instead of Sort Components

Try to remove the Sort components, and add an Execute SQL Task after the Data Flow Task which run a similar query:

;WITH x AS 
(
  SELECT col1, col2, col3, rn = ROW_NUMBER() OVER 
      (PARTITION BY col1, col2, col3 ORDER BY id)
  FROM dbo.tbl
)
DELETE x WHERE rn > 1;

4. Use a script component instead of Sort

You have to implement a similar logic to the following answer:

Some helpful Links

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • 1
    thanks for your description reply, it's really helpful. Need your concern about one issue, After execute 1 and 2 task , i need unique/distinct records in Data-flow-task , base on unique records of source information need to perform insert or update in destination. By follow your task order It's not possible to achieve. Will you please clarify this issue , how to perform insert or update on target table base on source distinct records. – shamim Dec 10 '17 at 08:07
  • The task i provided are not related with each other, i tried to give some suggestions. The numbering is not an order – Hadi Dec 10 '17 at 08:50
  • The third suggestion is removing duplicates using delete statment and a common table expression. You can import data to a stagging table and remove duplicates then import data to destination. Or add a column to destination that identify each import operation rows – Hadi Dec 10 '17 at 09:06
  • The problem is data is too big, if I am going to import the whole data, it is going to take time plus it is going to dramatically increase the database size. What you think it impact in performance? – shamim Dec 10 '17 at 11:33
  • @shamim all suggestions in my answer and Nick answer needs to be tested. because you are the only one who have all informations about the operation you are trying to do: *Resources, File structure, file size, destination, goals, restrictions, ...*. So You must do some experiments to achieve your goal. Also Sometimes to get better performance you have to consume more disk space. – Hadi Dec 10 '17 at 11:44
  • @shamim you have to make some test experiments, and after testing many scenarios, you have to choose one, based on your goals and the restrictions that you may have. Best wishes – Hadi Dec 10 '17 at 11:45
  • @shamim i don't think that you will get another suggestions other than my and Nick suggestions. – Hadi Dec 10 '17 at 11:51
  • @shamim i am not writing this to accept my answer, even if i really appreciate it. But you can just upvote these 2 answers to say that they are helpful. And wait for other suggestions maybe. – Hadi Dec 10 '17 at 12:05
  • @shamim looks like creating a staging table *(Nick answer)* increases the performance. I am happy that your issue is been solved. Best wishes. – Hadi Dec 10 '17 at 15:17
  • @shamim it is good to say `thanks for your description reply, it's really helpful.` without accepting or up voting an answer :) I think that both answers are on the right way – Yahfoufi Dec 12 '17 at 13:44