0

I have a logging table that I need to run close to real time reports on. The primary key is a guid that is inserted non sequentially. When other indexes are added to aid in reporting speeds I am seeing timeouts on inserts. I plan on creating a duplicate table with the reporting indexes and leaving only the clustered index with a fill factor of 75% on the in the logging table.

I feel like using replication would be overkill for only one table. So my question is how should I accomplish this? Should I schedule a sql agent job?

Any help would be great! Thanks!

Mike
  • 5,437
  • 7
  • 45
  • 62
  • Can you clarify if the data in the logging table references other entities in your database or is it just unpredictable data like user traffic? – Gats Apr 26 '12 at 21:49
  • It is user traffic. the table logs the page, the user, date, and how long the user was on the page. – Mike Apr 26 '12 at 21:51

1 Answers1

2

You have a lot of options available to you. To solve your immediate problem, I'd make sure that the logging table has a sequential ID on it and only copy across new data to the well indexed reporting table. You can then run a NON-LOCKING proc to get just the max value for the corresponding column in the reporting and get anything bigger from the logging table.

DECLARE @MaxValue uniqueidentifier

SELECT @MaxValue = MAX(LogID) FROM LogTable

INSERT INTO 
   ReportingTable
   (...)
SELECT
   ...
FROM
   LogTable WITH NOLOCK
WHERE
   LogID > @MaxValue

NB: See this for NOLOCK debate. Only use it if your log table is BIG and BUSY, but your scenario is an example of where it makes sense.

I'm not sure what generates your GUID, but if it is SQL server, then making the column default to NEWSEQUENTIALID() means it's not entirely random, but it is sequential. If you are getting the GUID from somewhere else, then consider putting a bigint IDENTITY(1,1) column to make sure your clustered index is not badly fragmented. A fragmented clustered index is bad as your other indexes use it for lookups so no matter how nice they are, they'll still run slower than the primary.

Then you can structure your reporting table any way you like with a clustered index on the GUID and only and use the max value in your well indexed table to only insert the new data you don't already have. You can also run re-index on that table to ensure it's as fast as possible.

More information..

If you expect your system to grow a lot, then you should look at some other reporting patterns to denormalise or simplify the data as you dump it in the reporting tables.

Take a look at the star pattern or you can just denormalise daily or weekly stats into a seperate table like the following:

Date, TimespentTotal, TimespentAvg, UserFact2Count....

then you can run a non-locking stored procedure to get a day's stats at a time. I'd also recommend that days stats be loaded based on what you don't have in the reporting database incase the agent job that runs it fails, it will take care of backlog when it runs next. Obviously these patterns are a lot more work and the wrong choice can be bad so only make this choice if you need to and know what's expected of your reporting.

Community
  • 1
  • 1
Gats
  • 3,452
  • 19
  • 20
  • Thank you for such a detailed answer. the guids are generated client side and a stored procedure checks if the record already exists before inserting. I think using an identity column like you suggested would make it easy to keep tables synced, cut fragmentation and I think I would still benefit from a nonclustered index on the guid when checking if the record exists rather than using it as the clustered index and pk. – Mike Apr 27 '12 at 00:25