0

I am struggling to debug performance on a particular query. The query is this:

select count(*)  
FROM dbo.user d
INNER JOIN dbo.distinct_first_name dfn ON (
        [dbo].jw(dfn.first_name, 'john') > 0.8
        AND
        (d.first_name = dfn.first_name
         OR d.nick_name = dfn.first_name
         OR d.middle_name = dfn.first_name)
        )

The query runs a Jaro Winkler filter on a distinct first name table (containing approx 15k rows) and then inner joins this against the user table to produce the result set. As defined, this takes around 1 minute to run with approx 500k rows in the user table.

Here's what I know:

1) The Jaro Winkler filter is almost instant (0.1s by itself)

2) If I change the user clause to only include one of the columns (i.e. remove the ORs) it takes only 0.4s

3) If I change this to three queries, and run them back to back, it takes approx 2s

4) If I change the Jaro Winkler filter to 0.99 (so that there's only one result) it makes no substantive difference in the query execution time

5) If I replace the Jaro Winkler filter with an equality operation (dfn.first_name = 'john') total query time is reduced to 4s

(All timings are on a fairly slow virt; real life performance will be better.)

So, for some reason, the combination of the function and the ORs are confusing the query optimizer. The execution plan is not very informative; it says that 90% of the query is spent on:

<RelOp NodeId="63" PhysicalOp="Clustered Index Seek" LogicalOp="Clustered Index Seek" EstimateRows="1.69029" EstimateIO="0.003125" EstimateCPU="0.000158859" AvgRowSize="17" EstimatedTotalSubtreeCost="71.4311" TableCardinality="15958" Parallel="0" EstimateRebinds="448881" EstimateRewinds="0.504024" EstimatedExecutionMode="Row">
                              <OutputList>
                                <ColumnReference Database="[mydb]" Schema="[dbo]" Table="[distinct_first_name]" Alias="[dfn]" Column="first_name" />
                              </OutputList>
                              <RunTimeInformation>
                                <RunTimeCountersPerThread Thread="0" ActualRows="857936" ActualEndOfScans="859454" ActualExecutions="859454" />
                              </RunTimeInformation>
                              <IndexScan Ordered="1" ScanDirection="FORWARD" ForcedIndex="0" ForceSeek="0" ForceScan="0" NoExpandHint="0" Storage="RowStore">

Splitting the query up is actually an option, since this is in a sproc, and I can probably redesign the schema a little, but I'm stumped as to what's bogging this down. Any ideas?

user717847
  • 695
  • 1
  • 6
  • 16
  • This would point me to believe that you are missing indexes on the extra columns. Did you try to rewrite it from another side of getting the values and storing into temp table or table variable and joining that result set with other table. –  Apr 22 '14 at 16:43
  • @VladimirOselsky there are indices on the extra columns. Joining against any one of those columns individually is performant. – user717847 Apr 22 '14 at 16:47
  • Have you tried running the function separately from the `JOIN`? – Hart CO Apr 22 '14 at 17:04
  • @Goat CO -- yes, the function run by itself takes 0.1s on the same criteria. I could run it ahead of time into a temp table, and then join the temp table, I suppose, but that doesn't really get at what's breaking here. – user717847 Apr 22 '14 at 17:08
  • Unless the function is being called 3x per row due to the 3 fields... that doesn't seem like it should be happening, but something is clearly off. That would jive with the lack of performance difference between .8 and .99. Does relegating the function to a cte produce same results? – Hart CO Apr 22 '14 at 17:12
  • @GoatCO yeah that's my guess as to what it's doing. But yeah it doesn't seem like it should be happening. Tried creating a temp table first and then inner joining that, and that runs in about 8s. Combining that approach with 3 separate queries makes the whole thing run in 0.5s. So my guess is that the function is in fact running per row in the user table for some strange reason (even though that's not indicated in the execution plan), and the ORs seem to be misparsed. – user717847 Apr 22 '14 at 17:18
  • It would be nice if you could upload the whole XML execution plan (onedrive, dropbox..). I don't understand where the `EstimateRebinds="448881"` come from, but it's very suspicious. BTW, are you looking for ways to improve the query execution time (then turn ORs into UNION ALLs, this will help) or for an explanation why optimizer apparently did a bad job on it? – dean Apr 22 '14 at 17:51
  • @dean Thanks - the full query plan is here: http://pastebin.com/r9zyj0WA I'm really looking for both - to understand, and to improve it. Why doesn't the optimizer manage to turn the "or" structure into union all? – user717847 Apr 22 '14 at 18:25

2 Answers2

2

A few things you might want to try:

  • What is the clustered index of the dfn table? Is it just a table with names, nothing more? If so, remove the autonumber column if you have it and make the name the clustered index.

  • Is 'john' an argument to your sproc? I assume it is. You could first calculate the Jaro Winkler filter over the smallest of the two name-datasets and insert that into a temporary table. Then join the other table on the temporary table. Remember temporary tables can benefit from indexes too (if you add them).

  • You might be able to improve performance by creating a multi-column index: first name, nick name, middle name. The usefulness of individual indexes goes down because of all the columns you reference in your where-statement.

  • I think it's always fun to run the SQL Tuning advisor tool and see what kind of recommendations it makes. Simply attach a monitor to your SQL server instance and record the execution of your query to a workload file. You can then feed the workload file to the advisor tool and it will suggest indexes, statistics and even schema changes if you enable the option.

  • Precalculate whatever you can. If I remember correctly that in the Jaro Winkler filter the string length is an important factor. You could add a column to your dfn table with the string length of the name. Stuff like functions and views are nice, but not necessarely the best for performance. The function acts like a black box that is unable to use any pre-existing or pre-calculated data to its advantage.

Most important: measure your results. The SQL query optimizer has a mind of its own. Keep your eye on the execution plan and try different scenarios.

Queries based on text columns are always more difficult to optimize. You might want to have a look at full text indexes to increase performance a bit more, but that is a separate topic to investigate.

Yvo
  • 18,681
  • 11
  • 71
  • 90
  • Thanks. The dfn clustered index is the name column; there's nothing else in there. I'm using it as a small set of strings to run the expensive filter on. The temp table approach does work...but I don't understand why. Shouldn't the optimizer be able to figure out that the first part of the join produces a finite set of names to be used in the join? I will try the other approaches. (FYI the Jaro Winkler function is very fast when run against the dfn table. It replaces a combination of Like + Soundex run against the main users table, which was horrendously slow, even with precomputed columns.) – user717847 Apr 22 '14 at 18:30
  • I think that the user defined function is making it difficult for the optimizer to fully optimize your query (note what I mention at point 5, your function is mostly a black box to the optimizer). Did you run the Tuning Advisor? It probably won't offer you a perfect solution, but the suggestions it provides can be useful. – Yvo Apr 24 '14 at 16:09
1

First of all, both first_name_alphaonly and nick_name_alphaonly are actually non-persisted computed columns, so all cardinality est are off, and then multiplied.

Then, there are 857.936 individual clustered index seeks on distinct_first_name table, and only after that the filter including the jw function is applied.

Creating indexes on computed columns would help. Filtering on distinct_first_name prior to join (into a #temp table) would probably help also. And then it's the advice on turning ORs into UNION ALLs.

Optimizer, afaik, will never rearange ORs into UNIONs itself. Believe it's called playing it safe.

dean
  • 9,960
  • 2
  • 25
  • 26
  • Actually first_name_alphaonly and nick_name_alphaonly are persisted, according to the schema definitions. Where in the query plan do you see that? – user717847 Apr 22 '14 at 19:17
  • I don't understand the index seeks on distinct_first_name though... it should be hit once only to calculate first side of the AND, no? In other queries it appears to work properly, so something about this structure does throw it off. – user717847 Apr 22 '14 at 19:18
  • It's the inner side of the join. The plan deals with `user` table first, then joins to `distinct_first_name`. – dean Apr 22 '14 at 19:24
  • switching the sides of the join doesn't improve performance - if anything it worsens it. If you think about it it's the same operation, no matter which side of the join the function is on - "apply filter set of names to large table U". So I still don't see why it think it needs to scan distinct_first_name more than once. (PS you might have missed my first comment ... what made you think the computed columns are not persisted? They are. They have indices on them already, also.) – user717847 Apr 22 '14 at 19:39
  • Check node 6 in the plan; it's the index scan on some _dta_ index, output being first_name, middle_name and nick_name columns. A bit above that you can see a scalar operator cm_Fn_AlphaOnly_AtoZ applied to the two, resulting in first_name_alphaonly and nick_name_alphaonly. – dean Apr 22 '14 at 19:51
  • that's truly bizarre. Why would it choose to avoid the persisted version of the computed column? (If I reduce this to just one of the columns it no longer shows cm_Fn_AlphaOnly_AtoZ in the query plan.) – user717847 Apr 22 '14 at 20:20
  • I found this: http://stackoverflow.com/questions/5998217/why-does-the-execution-plan-include-a-user-defined-function-call-for-a-computed – user717847 Apr 22 '14 at 20:44