4 column table with 9 million records takes 25+ minutes

Question

I have a table with 4 columns as

field1 (PK), field2(nvarchar(128)), field3(nvarchar(max)), field4(datetime)

field3 has xml data in it. Table has about 9 million record.

A simple SELECT statement like the following, takes more than 25 minutes:

SELECT field1,field2,field3 
FROM [table]

Can someone tell me if there's any way to optimize it?

How much data is the query returning and what does it travel over? 10TB pushed over a low speed network connection will take some time. 10MB of results on the server should be rather faster. — HABO, Jan 28 '18 at 23:44
Thanks HABO, for quick reply. It returns about 9 million records. I'm running this locally and it's taking this long, without running on production environment. Not sure if it can be optimized somehow? Is it because of nvarchar(max) field that has xml data? — dweb, Jan 28 '18 at 23:49
why you want do select all over 9 millions table? doesnt make sense to me. — Juan Carlos Oropeza, Jan 29 '18 at 00:13
Anyway most like it is a hardware problem, slow or busy disks. I can clone a table of 40 millions records very fast with `CREATE TABLE as SELECT` — Juan Carlos Oropeza, Jan 29 '18 at 00:14
9 million rows with a variable length field from not much to 2GB per row leaves a lot of room for guessing. Have you tried summing the [`DataLength()`](https://learn.microsoft.com/en-us/sql/t-sql/functions/datalength-transact-sql) of whichever `Field2` you think contains the `NVarChar(max)]` column? (Your table appears to have three `Field2`s.) — HABO, Jan 29 '18 at 02:05

paparazzo · Answer 1 · 2018-01-29T07:50:53.223

1

It is probably just a volume thing
Try

select top (1) * 
from table

Why do you need 9 million rows returned?

edited Jan 29 '18 at 07:50

answered Jan 29 '18 at 01:23

paparazzo

44,497
23
105
176

Yes Paparazzi, it does return above results faster, However How can i improve performance, I would like to move records to a difference table based on the results I get from that query. If that's not possible, I would like to move data in patches i.e monthly or bi weekly but if I put date range in where clause it is still taking 10+ minutes... – dweb Jan 31 '18 at 19:43

score 1 · Answer 2 · answered Feb 01 '18 at 14:51

When you do SELECT * FROM table you tell the machine to load a bunchload of data from the database into your client software (SSMS?). Simply fetching the data from disk probably doesn't take all that much time, converting it to something that's transportable, putting it on the network (or shared memory if you run locally) and then getting it back from 'transport-mode' to 'data-mode' all in all will take some time but if you're not running over a telephone line should be fairly fast too. However, loading 9 million rows into the grid of SSMS WILL take quite a lot of time, especially if there is a special column like an xml involved.

To avoid that side-effect, either do SELECT COUNT(*) FROM table instead or have a look at How to Execute SQL Query without Displaying results

From the comments it seems you want to be able to move data from one table to another based on a date-field, presumably field4.

You can do that easily by using the syntax:

INSERT tableB (field1, field2, ...)
SELECT field1, field2
  FROM tableB

The data will be copied on the server itself, nothing needs to go over the wire and all your client machine needs to show is (9 million rows affected)

Off course, this will not be instantaneous either, after all, the data needs to be read from disk and written back to disk elsewhere. But it should be fairly quick.

If you want to be able to filter on your datetime field, then it's probably a good idea to have an index on that field. Having the index will allow the system to quickly find all 'matching' records it needs to copy, not having an index means it will have to read each and every record to verify if it might or might not match your criteria.

PS: since there is no such a thing as a free lunch. Having the index there means that MSSQL needs to keep the index up-to-date at all times. If you add, remove or update records then both the table and the index will need to be updated. Next to that, the index will take extra space in the database too. It's up to you to decide if your need to have quick responses when querying for a given datetime (or range) is more important than having the (little) overhead when making changes to the table (INSERT/UPDATE/DELETE), or not.

Thank you deroby for your detailed explanation. Since this is going to be one time thing (moving data), I think adding an index to datetime column would work better based on your explanation. Now can you suggest if I should add clustered or nonclustered index on datetime field? Also, Since I won't be able to get that done in single shot as this will be on prod enviroment, — dweb, Feb 01 '18 at 17:20
I will need to do it maintenance window everyday. Now, you think it would work better if at start of maintance window, I add index to datetime column and remove it at the end, so the add/remove/update operations don't get affect? Thank you once again for your help! — dweb, Feb 01 '18 at 17:20
Adding the index shouldn't take too long, even on a 9 million records table. Most likely your PK already is defined as clustered and there can be only one clustered index. Adding the index, getting what you need and dropping the index sounds like a workable idea. You'll have to test it to be sure but I don't see any major obstacle. — deroby, Feb 01 '18 at 18:07
Thank you deroby, will let you know how it goes. Thank you for all your help. — dweb, Feb 02 '18 at 19:13

4 column table with 9 million records takes 25+ minutes

2 Answers2