Data retrieving is taking more than 4hrs to fetch 400K records

Question

I have a table with 1.5 million rows, it's a parent table with 12 child tables.

Even for a simple count query, it is taking hours to complete.

select count(1) 
from table
where col_filter >= 'number'

table contains 40 columns and col_filter is defined as a varchar datatype. col_filter is not indexed.

Questions:

What should I check to find the potential issues with my table setup?
I am using Microsoft SQL Server Management Studio 18, is there any tool that I can use to understand and get some recommendations to optimize the performance?
If indexing is the solution, is it possible to calculate how much extra space the index creation will occupy?

Any suggestions would be really appreciated.

"col_filter is not indexed" should hint you on the direction of where you need to go — siggemannen, Aug 01 '23 at 06:09
@siggemannen Is it possible to calculate how much extra space the index creation will occupy? — Python coder, Aug 01 '23 at 07:13
You can guesstimate, take select sum(datalength(yourvarcharcolumn)) from yourtable; — siggemannen, Aug 01 '23 at 07:25
What SKU are you running this on in Azure? `S0` or even Basic? "Hours" to scan 1.5 million rows doesn't sound usual — Martin Smith, Aug 01 '23 at 09:13
@MartinSmith Yes, it is on azure, `Standard: S6` with 400 DTUs. Around 80% of database is utilized and my table size is around 122GB, calculated using [this](https://stackoverflow.com/a/7892349/11844406). — Python coder, Aug 01 '23 at 09:20
So that means each row is taking 81.33 KB if the table has 1.5 million rows. Does the table have a clustered index or is it a heap? Is it expected that rows should be using around that amount or is potentially some space being wasted? — Martin Smith, Aug 01 '23 at 09:26
To estimate space requirements for the index you can use the method here https://learn.microsoft.com/en-us/sql/relational-databases/databases/estimate-the-size-of-a-nonclustered-index?view=sql-server-ver16 - but you should probably also check whether you have any internal fragmentation that can be removed by a rebuild to compact things - and if you can store anything compressed to reduce your avg overhead per row - NB if that 81.33 KB per row is just from a single heap/clustered index then most of your data will be being stored off row so page and row compression won't help with that — Martin Smith, Aug 01 '23 at 09:46
@MartinSmith Table contains clustered index for primary key and non-clustered index on other field which I am not using for my filter. — Python coder, Aug 01 '23 at 09:53
Ah OK - reason for the question is that heaps do have a known issue where empty pages don't get deallocated so sometimes this can bloat tables out to unexpectedly large sizes - but not applicable here — Martin Smith, Aug 01 '23 at 09:54
@MartinSmith I will try to estimate size and create the index, is there any other advice would you like to give? I am new to database admin related tasks. — Python coder, Aug 01 '23 at 09:58
The main thing I'd be looking to do is understand why this table is taking 122GB and consider ways to reduce it. As we can't see your table difficult to offer suggestions - but make sure you aren't using over wide fixed width columns, if `nvarchar` columns are a large consumer would `varchar` + utf8 collation save space? If data is just stored there and not filtered on would [`COMPRESS`](https://learn.microsoft.com/en-us/sql/t-sql/functions/compress-transact-sql?view=sql-server-ver16) function save space? — Martin Smith, Aug 01 '23 at 10:11
@MartinSmith Okay, I will explore them. Thanks for your suggestions. — Python coder, Aug 01 '23 at 10:51

score 0 · Answer 1 · answered Aug 01 '23 at 06:33

0

Right click on the query design area in SSMS and click "Display Estimated Execution Plan." The heading on the execution plan will show "Missing Index" which it believes you need to create to improve performance. It will estimate how much an improvement will be made. Right click on the Execution Plan and click "Missing Index Details". This will create a query template to generate the suggested index. You just need to edit the name and remove the comment tags to execute it.

answered Aug 01 '23 at 06:33

Galaxiom

109
4

Is it possible to calculate how much extra space the index creation will occupy? – Python coder Aug 01 '23 at 07:05
@Pythoncoder, Yes,explore sp_spaceused. – KumarHarsh Aug 01 '23 at 07:13

score 0 · Answer 2 · answered Aug 04 '23 at 11:13

you can first analyze your data,since col_filter is varchar and it caanot be change

check if it contain any blank or null row.This is what taking lot of time.

select count(1) 
from table
where  (col_filter ='' or col_filter is null)

select count(1) 
from table
where col_filter >= 'number'
and isnumber(col_filter)=1 and (col_filter <>'' or col_filter is not null)

select col_filter 
from table
where  isnumber(col_filter)=0

After this result.We can decide about index.

Throw table structure.Is there any existing index in any or constrain define.

Data retrieving is taking more than 4hrs to fetch 400K records

2 Answers2