Handling large amounts of data in SQL

Question

I've just taken over a project at work, and my boss has asked me to make it run faster. Great.

So I've identified one of the major bottlenecks to be searching through one particular table from our SQL server, which can take up to a minute, sometimes longer, for a select query with some filters on it to run. Below is the SQL generated by C# Entity Framework (minus all the GO statements):

CREATE TABLE [dbo].[MachineryReading](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Location] [geometry] NULL,
    [Latitude] [float] NOT NULL,
    [Longitude] [float] NOT NULL,
    [Altitude] [float] NULL,
    [Odometer] [int] NULL,
    [Speed] [float] NULL,
    [BatteryLevel] [int] NULL,
    [PinFlags] [bigint] NOT NULL, -- Deprecated field, this is now stored in a separate table
    [DateRecorded] [datetime] NOT NULL,
    [DateReceived] [datetime] NOT NULL,
    [Satellites] [int] NOT NULL,
    [HDOP] [float] NOT NULL,
    [MachineryId] [int] NOT NULL,
    [TrackerId] [int] NOT NULL,
    [ReportType] [nvarchar](1) NULL,
    [FixStatus] [int] NOT NULL,
    [AlarmStatus] [int] NOT NULL,
    [OperationalSeconds] [int] NOT NULL,
 CONSTRAINT [PK_dbo.MachineryReading] PRIMARY KEY NONCLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)

ALTER TABLE [dbo].[MachineryReading] ADD  DEFAULT ((0)) FOR [FixStatus]
ALTER TABLE [dbo].[MachineryReading] ADD  DEFAULT ((0)) FOR [AlarmStatus]
ALTER TABLE [dbo].[MachineryReading] ADD  DEFAULT ((0)) FOR [OperationalSeconds]
ALTER TABLE [dbo].[MachineryReading] WITH CHECK ADD  CONSTRAINT [FK_dbo.MachineryReading_dbo.Machinery_MachineryId] FOREIGN KEY([MachineryId])
REFERENCES [dbo].[Machinery] ([Id])
  ON DELETE CASCADE
ALTER TABLE [dbo].[MachineryReading] CHECK CONSTRAINT [FK_dbo.MachineryReading_dbo.Machinery_MachineryId]
ALTER TABLE [dbo].[MachineryReading] WITH CHECK ADD  CONSTRAINT [FK_dbo.MachineryReading_dbo.Tracker_TrackerId] FOREIGN KEY([TrackerId])
  REFERENCES [dbo].[Tracker] ([Id])
  ON DELETE CASCADE
ALTER TABLE [dbo].[MachineryReading] CHECK CONSTRAINT [FK_dbo.MachineryReading_dbo.Tracker_TrackerId]

The table has indexes on MachineryId, TrackerId, and DateRecorded:

CREATE NONCLUSTERED INDEX [IX_MachineryId] ON [dbo].[MachineryReading]
(
    [MachineryId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

CREATE NONCLUSTERED INDEX [IX_MachineryId_DateRecorded] ON [dbo].[MachineryReading]
(
    [MachineryId] ASC,
    [DateRecorded] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

CREATE NONCLUSTERED INDEX [IX_TrackerId] ON [dbo].[MachineryReading]
(
    [TrackerId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

When we select from this table, we are almost always interested in one machinery or tracker, over a given date range:

SELECT *
FROM MachineryReading
WHERE MachineryId = 2127 AND
      DateRecorded > '2016-12-08 00:00:10.009' AND DateRecorded < '2016-12-11 18:32:41.734'

As you can see, it's quite a basic setup. The main problem is the sheer amount of data we put into it - about one row every ten seconds per tracker, and we have over a hundred trackers at the moment. We're currently sitting somewhere around 10-15 million rows. So this leaves me with two questions.

Am I thrashing the database if I insert 10 rows per second (without batching them)?
Given that this is historical data, so once it is inserted it will never change, is there anything I can do to speed up read access?

Well one way to increase the performance of Data Access is to create an index - the index should be based on the sql statement you are running all the time - in your case I think you should create Index on MachineryId and DateRecorded - in either Ascending or descending order depending on requirement- — Dawood Awan, Dec 11 '16 at 19:08
Apologies, I went into SQL Server Management Studio and exported the SQL for the table. I thought that included the indexes but it didn't. The question has been updated. — Andrew Williamson, Dec 11 '16 at 19:10
1 - please post the create index sql - 2. Have you checked the fragmentation on index? — Dawood Awan, Dec 11 '16 at 19:12
@DawoodAwan Oh God... I followed the steps on [Schneider Electric](http://www.schneider-electric.com/en/faqs/FA234246/) to check the index fragmentation, it dropped the connection and Azure shows maxed out CPU. Hope that finishes sometime soon... — Andrew Williamson, Dec 11 '16 at 19:22
Ok - I don't think you really need all 3 of those indexes - (IX_MachineryId and IX_TrackerId) - Note: adding too many indexes on the table increase the DB size, - you can check how many times an index has been used by running the sP_BlitzIndex script - To help you with DB/indexes follow this it helped me: https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit — Dawood Awan, Dec 11 '16 at 19:28
@DawoodAwan Average fragmentation for IX_TrackerId: 80.90%, IX_MachineryId: 80.88%, IX_DateRecorded: 80.84%... The query also returned about 25 rows for MachineryReadingPs though, and that's a deprecated field. I wonder if I can remove that. Can you post your original comment as an answer, as it has been the most helpful — Andrew Williamson, Dec 11 '16 at 19:37
Also I think you should delete the Non-Clustered Index on MachineryId and DateRecorded - and replace it with a clustered Index - Clustered index is the way the data is stored on the disk - in your case you need it sorted by MachineryId and DateRecordred - Non Clustered index will create a separate copy of the data - Test in a test environment first (check the Query Plan and statistics IO) — Dawood Awan, Dec 11 '16 at 19:37

score 3 · Accepted Answer · edited May 23 '17 at 10:30

You have too many non-clustered indexes on the table - which will increase the size of the DB.

If you have an index on MachineryId and DateRecorded - you don't really need a separate one on MachineryId.

With 3 of your Non-Clustered indexes - there are 3 more copies of the data

Clustered VS Non-Clustered

No Include on the Non-Clustered index

When SQL Server is executing your SQL it is first searching the Non-Clustered Index for the required data, then it is going back to the original table (bookmark lookup) Link and getting the rest of the columns as you are doing select *, but the non-clustered index doesn't have all the columns (That is what I think is happening - Can't really tell without the Query Plan)

Include columns in non-clustered index: https://stackoverflow.com/a/1308325/1910735

You should maintain you indexes - by creating a maintenance plan to check for fragmentation and rebuild or reorganize your indexes on weekly basis.
I really think you should have a Clustered index on your MachineryId and DateRecordred instead of a Non-Clustered index. A table can only have one Clustered Index ( this is the order data is stored on the Hard Disk) - as Most of your queries will be in DateRecordred and MachineryId order - it will be better to store them that way,

Also if you really are searching by TrackerId in any query, try adding it to the same Clustered Index

IMPORTANT NOTE: DELETE THE NON-CLUSTERED INDEX in TEST environment before going LIVE

Create a clustered index instead of your non-clustered index, run different queries - Check the performance by comparing the Query Plans and the STATISTICS IO)

Some resources for Index and SQL Query help:

Subscribe to the newsletter here and download the first responder kit: https://www.brentozar.com/?s=first+responder

It is now open source - but I don't know if it has the actual PDF getting started and help files (Subscribe in the above link anyway - for weekly articles/tutorials)

https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit

Your point about bookmark lookups was the most valuable - we had the query fields in a non-clustered index, but as soon as we selected other columns, it had to do a lookup, and so preferred a different, suboptimal index. Including the values we wanted in the index has fixed the problem. — Andrew Williamson, Dec 14 '16 at 00:42
@AndrewWilliamson I am also going to suggest you check by removing the non-clustered index and creating a clustered index - as non-clustered index is a copy of your data - as it is a tracking platform each machinery might have 5K records per day - if you have 1000 vehicles saving data to the same table the table (5K * 1K records per day), eventually the table size will increase to say 8 GB but with a Non-Clustered index the same table might be 14 or 16 GB. I think there is a way to check index size. — Dawood Awan, Dec 14 '16 at 12:06
Thanks for the suggestion. I didn't know at the time of asking, but the table is actually partitioned on the DateRecorded field, and that is used as our clustered index. Both I and the guy that did the partitioning don't really know how it works, so we don't know what to change on that one. Reducing size isn't currently a priority, so I have some time to learn. — Andrew Williamson, Dec 14 '16 at 17:50

score 2 · Answer 2 · answered Dec 11 '16 at 19:06

Tuning is per query, but in any case -
I see you have no partitions and no indexes, which means, no matter what you do. it always results in a full table scan.

For your specific query -

create index MachineryReading_ix_MachineryReading_DateRecorded 
    on (MachineryReading,DateRecorded)

score 2 · Answer 3 · answered Dec 11 '16 at 19:07

First, 10 inserts per second is very feasible under almost any reasonable circumstances.

Second, you need an index. For this query:

SELECT *
FROM MachineryReading
WHERE MachineryId = 2127 AND
      DateRecorded > '2016-12-08 00:00:10.009' AND DateRecorded < '2016-12-11 18:32:41.734';

You need an index on MachineryReading(MachineryId, DateRecorded). That will probably solve your performance problem.

If you have similar queries for tracker, then you want an index on MachineryReading(TrackerId, DateRecorded).

These will slightly impede the progress of in the inserts. But the overall improvement should be so great, that all will be a big win.

Handling large amounts of data in SQL

3 Answers3