2

Take a look at this execution plan: http://sdrv.ms/1agLg7K
It’s not estimated, it’s actual. From an actual execution that took roughly 30 minutes.

Select the second statement (takes 47.8% of the total execution time – roughly 15 minutes).
Look at the top operation in that statement – View Clustered Index Seek over _Security_Tuple4. The operation costs 51.2% of the statement – roughly 7 minutes.

The view contains about 0.5M rows (for reference, log2(0.5M) ~= 19 – a mere 19 steps given the index tree node size is two, which in reality is probably higher).
The result of that operator is zero rows (doesn’t match the estimate, but never mind that for now).
Actual executions – zero.

So the question is: how the bleep could that take seven minutes?! (and of course, how do I fix it?)


EDIT: Some clarification on what I'm asking here.
I am not interested in general performance-related advice, such as "look at indexes", "look at sizes", "parameter sniffing", "different execution plans for different data", etc.
I know all that already, I can do all that kind of analysis myself.

What I really need is to know what could cause that one particular clustered index seek to be so slow, and then what could I do to speed it up.

Not the whole query.
Not any part of the query.
Just that one particular index seek.
END EDIT


Also note how the second and third most expensive operations are seeks over _Security_Tuple3 and _Security_Tuple2 respectively, and they only take 7.5% and 3.7% of time. Meanwhile, _Security_Tuple3 contains roughly 2.8M rows, which is six times that of _Security_Tuple4.

Also, some background:

  1. This is the only database from this project that misbehaves. There are a couple dozen other databases of the same schema, none of them exhibit this problem.
  2. The first time this problem was discovered, it turned out that the indexes were 99% fragmented. Rebuilding the indexes did speed it up, but not significantly: the whole query took 45 minutes before rebuild and 30 minutes after.
  3. While playing with the database, I have noticed that simple queries like “select count(*) from _Security_Tuple4” take several minutes. WTF?!
  4. However, they only took several minutes on the first run, and after that they were instant.
  5. The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.
Fyodor Soikin
  • 78,590
  • 9
  • 125
  • 172

3 Answers3

3

First I'd like to point out a little misconception here: although the delete statement is said to take nearly 48% of the entire execution, this does not have to mean it takes 48% of the time needed; in fact, the 51% assigned inside that part of the query plan most definitely should NOT be interpreted as taking 'half of the time' of the entire operation!

Anyway, going by your remark that it takes a couple of minutes to do a COUNT(*) of the table 'the first time' I'm inclined to say that you have an IO issue related to said table/view. Personally I don't like materialized views very much so I have no real experience with them and how they behave internally but normally I would suggest that fragmentation is causing its toll on the underlying storage system. The reason it works fast the second time is because it's much faster to access the pages from the cache than it was when fetching them from disk, especially when they are all over the place. (Are there any (max) fields in the view ?)

Anyway, to find out what is taking so long I'd suggest you rather take this code out of the trigger it's currently in, 'fake' an inserted and deleted table and then try running the queries again adding times-stamps and/or using some program like SQL Sentry Plan Explorer to see how long each part REALLY takes (it has a duration column when you run a script from within the program). It might well be that you're looking at the wrong part; experience shows that cost and actual execution times are not always as related as we'd like to think.

deroby
  • 5,902
  • 2
  • 19
  • 33
  • Thank you very much for the well phrased and thought through answer. Can you please point me to something to read on what "cost" really means and how it's related to duration? – Fyodor Soikin Sep 12 '13 at 22:27
  • Sadly there is no magic formula that links cost to processing time (aka Wall Clock time). In MSSQL7 I was under the (naive?) impression that they were at least 'directly related' but as versions progressed I've learned the hard way that they are different beasts. FYI: http://stackoverflow.com/questions/564717/measuring-query-performance-execution-plan-query-cost-vs-time-taken gives a bit of extra explanation. The thing to remember here is that cost is geared towards all resources on the server and how to spread them when many people/applications are accessing it; unlike most timing-tests. – deroby Sep 13 '13 at 09:47
  • Ok, I see now. Thank you very much for the explanation. Even though you did not solve my problem, you taught me something I didn't know before and gave me some ideas on where to look next, which is more than I was hoping for. So here's your well-earned bounty. :-) – Fyodor Soikin Sep 13 '13 at 14:50
  • Thx, good luck with finding out what's going on with the trigger. – deroby Sep 13 '13 at 20:39
1

Observations include:

  1. Is this the biggest of these databases that you are working with? If so, size matters to the optimizer. It will make quite a different plan for large datasets versus smaller data sets.
  2. The estimated rows and the actual rows are quite divergent. This is most apparent on the fourth query. "delete c from @alternativeRoutes...." where the _Security_Tuple5 estimates returning 16 rows, but actually used 235,904 rows. For that many rows an Index Scan could be more performant than Index Seeks. Are the statistics on the table up to date or do they need to be updated?
  3. The "select count(*) from _Security_Tuple4" takes several minutes, the first time. The second time is instant. This is because the data is all now cached in memory (until it ages out) and the second query is fast.
  4. Because the problem moves with the database then the statistics, any missing indexes, et cetera are in the database. I would also suggest checking that the indexes match with other databases using the same schema.

This is not a full analysis, but it gives you some things to look at.

RLF
  • 171
  • 9
  • You are not answering the question that I asked. I did all those analyses, and I have taken some steps to mitigate some of the possible problems you're talking about. The question I'm asking is quite simple: **what could cause a simple clustered index seek over 500K records to run for seven minutes?** – Fyodor Soikin Sep 03 '13 at 19:14
  • Once again: I'm not concerned about what plan the optimizer takes, or whether the data is in the cache or not, or what indexes are missing, etc. I can do all that analysis myself quite well. What I need to know is, **why that index seek is so slow** and **what could I do to speed it up**. Not the whole query. Just that one particular index seek. – Fyodor Soikin Sep 03 '13 at 19:15
  • 1
    @FyodorSoikin - I am not on your server and I only had your query plan to look at. So, how could I know? Is your database corrupted? Serious blocking on that one database? (But from what?) Does the database exist on different drive that is having problems? On a USB Flashdrive? And why do you think it runs in an 'instant' after the first select? Let's just say that I have no idea,but I hope someone else can help. – RLF Sep 03 '13 at 21:17
  • @FyodorSoikin - Your question has nearly 2500 characters, but all you're really asking is how to improve the clustered index seek performance? If you don't want advice on tuning the query, then don't post the execution plan and leave all the extra noise out. Instead post the execution plan to the clustered index seek in question. – brian Sep 04 '13 at 01:44
  • 1
    @brian, can you please elaborate a little? How can I get the plan for the seek? – Fyodor Soikin Sep 04 '13 at 16:04
0

Fyodor,

First:

The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.

I presume that you: a) run this query in isolated environment, b) the data is not under mutation.

Is this correct?

Second: post here your CREATE INDEX script. Do you have a funny FILLFACTOR? SORT_IN_TEMPDB?

Third: which type is your ParentId, ObjectId? int, smallint, uniqueidentifier, varchar?

Alex Yu
  • 3,412
  • 1
  • 25
  • 38
  • First: I run this query on my local computer, without anything else running. The data, however, may be "under mutation", depending on what you mean by that. This query plan is updating those same indexes: you look to the left in the diagram, you'll see the "Update Index" nodes. – Fyodor Soikin Sep 12 '13 at 00:53
  • Second: create unique clustered index _Security_Tuple4_PK on _Security_Tuple4( Id1, Id2, Id3, Id4, SubjectId, PrivilegeId, DistancesJoined ) – Fyodor Soikin Sep 12 '13 at 00:55