0

Dilemma:

I am about to perform population of data on MS SQL Server (2012 Dev Edition). Data is based on production data. Amount is around 4TB (around 250 million items).

Purpose:

To test performance on full text search and on regular index as well. Target number should be around 300 million items around 500K each.

Question:

What should I do before to speed up the process or consequences that I should worry about?

Ex.

  1. Switching off statistics?
  2. Should I do a bulk insert of 1k items per transaction instead of single transaction?
  3. Simple recovery model?
  4. Log truncation?

Important:

I will use sample of 2k of production items to create every random item that will be inserted into database. I will use near unique samples generated in c#. It will be one table:

table 
(
    long[id], 
    nvarchar(50)[index], 
    nvarchar(50)[index], 
    int[index], 
    float, 
    nvarchar(50)[index], 
    text[full text search index]
)
Falcon
  • 650
  • 1
  • 8
  • 24
  • Possible duplicate http://stackoverflow.com/questions/410653/test-user-data-fake-data – Kermit Nov 02 '12 at 17:47
  • @njk I have no problem building data. I have performance wise concerns regarding populating database with large amount of data. – Falcon Nov 03 '12 at 00:35

1 Answers1

1

Almost invariably, in a situation like this, and I've had several of them, I've used SSIS. SSIS is the fastest way I know to import large amounts of data into a SQL Server database. You have complete control over batch (transaction size) and it will perform bulk inserting. In addition, if you have transformation requirements, SSIS will handle this with ease.

Randy Minder
  • 47,200
  • 49
  • 204
  • 358
  • All of the data will be created on the fly (due to memory). Text data comes from randomly generated files and all other comes from randomly generated information. All data is generated to mimic certain scenario. What is the best way to port C# generated information into SSIS? Is there a provider of some sort? Or do I have to create intermediate files? – Falcon Nov 03 '12 at 00:46
  • Do you care to elaborate on your answer since I asked about integration with generated data and not import? There is a big difference between population of data and its migration. – Falcon Nov 09 '12 at 13:00
  • If I read your now edited question correctly, you are inserting 2,000 generated rows in your table as a sample. This is such a small amount, you don't need to take any special precautions. – Randy Minder Nov 09 '12 at 14:22
  • I did not edit my question. (marc_s did) I have rolled-back just now. In my estimation I am going to populate around 250 millions items (4TB) based on 2k real ones from production. All items will be generated - they do not exist. Again - it is not integration, it is population. – Falcon Nov 13 '12 at 10:21