0

I have reports that perform some time consuming data calculations for each user in my database, and the result is 10 to 20 calculated new records for each user. To improve report responsiveness, a nightly job was created to run the calculations and dump the results to a snapshot table in the database. It only runs for active users.

So with 50k users, 30k of which are active, the job "updates" 300k to 600k records in the large snapshot table. The method it currently uses is it deletes all previous records for a given user, then inserts the new set. There is no PK on the table, only a business key is used to group the sets of data.

So my question is, when removing and adding up to 600k records every night, are there techniques to optimize the table to handle this? For instance, since the data can be recreated on demand, is there a way to disable logging for the table as these changes are made?

UPDATE:

One issue is I cannot do this in batch because the way the script works, it's examining one user at a time, so it looks at a user, deletes the previous 10-20 records, and inserts a new set of 10-20 records. It does this over and over. I am worried that the transaction log will run out of space or other performance issues could occur. I would like to configure the table to now worry about data preservation or other items that could slow it down. I cannot drop the indexes and all that because people are accessing the table concurrently to it being updated.

CodeGrue
  • 5,865
  • 6
  • 44
  • 62

3 Answers3

1

It's also worth noting that indexing could potentially speed up this bulk update rather than slow it down, because UPDATE and DELETE statements still need to be able to locate the affected rows in the first place, and without appropriate indexes it will resort to table scans.

I would, at the very least, consider a non-clustered index on the column(s) that identify the user, and (assuming you are using 2008) consider the MERGE statement, which can definitely avoid the shortcomings of the mass DELETE/INSERT method currently employed.

According to The Data Loading Performance Guide (MSDN), MERGE is minimally logged for inserts with the use of a trace flag.

I won't say too much more until I know which version of SQL Server you are using.

Quick Joe Smith
  • 8,074
  • 3
  • 29
  • 33
0

This is called Bulk Insert, you have to drop all indexes in destination table and send insert commands in large packs (hundreds of insert statements) separated by ;

Another way is to use BULK INSERT statement http://msdn.microsoft.com/en-us/library/ms188365.aspx
but it involves dumping data to file.

See also: Bulk Insert Sql Server millions of record

Community
  • 1
  • 1
Anri
  • 6,175
  • 3
  • 37
  • 61
  • Thanks. I added an UPDATE to my question to discuss why I can't take this path. – CodeGrue Oct 04 '12 at 14:52
  • Well, if database is in use while you are updating it i suppose you can only have a mirror database. You can perform all time consuming operations on that database and than reroute your users to it. And vice versa next day. – Anri Oct 04 '12 at 17:21
0

It really depends upon many things

  • speed of your machine
  • size of the records being processed
  • network speed

etc.

Generally it is quicker to add records to a "heap" or an un-indexed table. So dropping all of your indexes and re-creating them after the load may improve your performance.

Partitioning the table may see performance benefits if you partition by active and inactive users (although the data set may be a little small for this)

Ensure you test how long each tweak adds or reduces your load and work from there.

David Adlington
  • 666
  • 4
  • 15
  • 27