FreeText Query is slow - includes TOP and Order By

Question

The Product table has 700K records in it. The query:

SELECT TOP 1 ID, Name FROM Product WHERE contains(Name, '"White Dress"') ORDER BY DateMadeNew desc

takes about 1 minute to run. There is an non-clustered index on DateMadeNew and FreeText index on Name.

If I remove TOP 1 or Order By - it takes less then 1 second to run.

Here is the link to execution plan. http://screencast.com/t/ZDczMzg5N

Looks like FullTextMatch has over 400K executions. Why is this happening? How can it be made faster?

UPDATE 5/3/2010

Looks like cardinality is out of whack on multi word FreeText searches:

Optimizer estimates that there are 28K records matching 'White Dress', while in reality there is only 1. http://screencast.com/t/NjM3ZjE4NjAt

If I replace 'White Dress' with 'White', estimated number is '27,951', while actual number is '28,487' which is a lot better.

It seems like Optimizer is using only the first word in phrase being searched for cardinality.

Could you please save the execution plan as `XML` and post the result here instead of the picture? — Quassnoi, Apr 30 '10 at 17:56
Possible duplicate of [FreeText COUNT query on multiple tables is super slow](http://stackoverflow.com/questions/1819333/freetext-count-query-on-multiple-tables-is-super-slow) — gariepy, May 31 '16 at 15:36

Quassnoi · Answer 1 · 2010-05-02T21:32:56.000

3

Looks like FullTextMatch has over 400K executions. Why is this happening?

Since you have an index combined with TOP 1, optimizer thinks that it will be better to traverse the index, checking each record for the entry.

How can it be made faster?

If updating the statistics does not help, try adding a hint to your query:

SELECT  TOP 1 *
FROM    product pt
WHERE   CONTAINS(name, '"test1"')
ORDER BY
        datemadenew DESC
OPTION (HASH JOIN)

This will force the engine to use a HASH JOIN algorithm to join your table and the output of the fulltext query.

Fulltext query is regarded as a remote source returning the set of values indexed by KEY INDEX provided in the FULLTEXT INDEX definition.

Update:

If your ORM uses parametrized queries, you can create a plan guide.

Use Profiler to intercept the query that the ORM sends verbatim
Generate a correct plan in SSMS using hints and save it as XML
Use sp_create_plan_guide with an OPTION USE PLAN to force the optimizer always use this plan.

edited May 02 '10 at 21:32

answered Apr 30 '10 at 17:52

Quassnoi

413,100
91
616
614

The new query takes 10-20 secs the first time, and after that less then 1 sec. Whille the original one was 1 minute on first and subsequent requests. Here is the new execution plan... http://screencast.com/t/MTZkMDcwNjQ Here is XML: http://docs.google.com/Doc?docid=0AfKNS6cREtKUZGdocngzOHZfMGhocnAzNWNm&hl=en Thanks... Now need to figure out how to get the Hint through Castle Active Record... – Eric P May 01 '10 at 07:07
RE: "Fulltext query is regarded as a remote source". This is not the case in SQL2008 I think (See http://technet.microsoft.com/en-us/library/cc721269.aspx) but the Cardinality estimates in the query plan seem way out for the OP. Estimated 28935 rows, Actual 1. – Martin Smith May 01 '10 at 14:18

score 1 · Answer 2 · edited Sep 25 '12 at 10:23

I had the same problem earlier.

The performance depends on which unique index you choose for full text indexing. My table has two unique columns - ID and article_number.

The query:

select top 50 id, article_number, name, ... 
from ARTICLE 
CONTAINS(*,'"BLACK*" AND "WHITE*"')
ORDER BY ARTICLE_NUMBER

If the full text index is connected to ID then it is slow depending on the searched words. If the full text index is connected to ARTICLE_NUMBER UNIQUE index then it was always fast.

score 1 · Answer 3 · answered Apr 30 '10 at 17:21

1

I can't see the linked execution plan, network police are blocking that, so this is just a guess...

if it is running fast without the TOP and ORDER BY, try doing this:

SELECT TOP 1
    *
    FROM (SELECT 
              ID, Name, DateMadeNew 
              FROM Product 
              WHERE contains(Name, '"White Dress"')
         ) dt
    ORDER BY DateMadeNew desc

answered Apr 30 '10 at 17:21

KM.

101,727
34
178
212

+1 I was going to post something similar. Any idea why SQL Server is choosing such a sub optimal plan? – Martin Smith Apr 30 '10 at 17:24
This will produce the same plan as the original query. `SQL Server` is very good in query unnesting. – Quassnoi Apr 30 '10 at 17:53

score 1 · Answer 4 · answered Apr 30 '10 at 20:46

A couple of thoughts on this one:

1) Have you updated the statistics on the Product table? It would be useful to see the estimates and actual number of rows on the operations there too.

2) What version of SQL Server are you using? I had a similar issue with SQL Server 2008 that turned out to be nothing more than not having Service Pack 1 installed. Install SP1 and a FreeText query that was taking a couple of minutes (due to a huge number of actual executions against actual) went down to taking a second.

1) Yes 2) Actually I didn't have SP1 installed on one of the systems I was testing on, but it didn't seem to make much of a difference. — Eric P, May 01 '10 at 07:10

Martin Smith · Accepted Answer · 2010-05-03T13:11:08.043

Edit

From http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506240

The most important thing is that the correct join type is picked for full-text query. Cardinality estimation on the FulltextMatch STVF is very important for the right plan. So the first thing to check is the FulltextMatch cardinality estimation. This is the estimated number of hits in the index for the full-text search string. For example, in the query in Figure 3 this should be close to the number of documents containing the term ‘word’. In most cases it should be very accurate but if the estimate was off by a long way, you could generate bad plans. The estimation for single terms is normally very good, but estimating multiple terms such as phrases or AND queries is more complex since it is not possible to know what the intersection of terms in the index will be based on the frequency of the terms in the index. If the cardinality estimation is good, a bad plan probably is caused by the query optimizer cost model. The only way to fix the plan issue is to use a query hint to force a certain kind of join or OPTIMIZE FOR.

So it simply cannot know from the information it stores whether the 2 search terms together are likely to be quite independent or commonly found together. Maybe you should have 2 separate procedures one for single word queries that you let the optimiser do its stuff on and one for multi word procedures that you force a "good enough" plan on (sys.dm_fts_index_keywords might help if you don't want a one size fits all plan).

NB: Your single word procedure would likely need the WITH RECOMPILE option looking at this bit of the article.

In SQL Server 2008 full-text search we have the ability to alter the plan that is generated based on a cardinality estimation of the search term used. If the query plan is fixed (as it is in a parameterized query inside a stored procedure), this step does not take place. Therefore, the compiled plan always serves this query, even if this plan is not ideal for a given search term.

Original Answer

Your new plan still looks pretty bad though. It looks like it is only returning 1 row from the full text query part but scanning all 770159 rows in the Product table.

How does this perform?

CREATE TABLE #tempResults
(
ID int primary key,
Name varchar(200),
DateMadeNew datetime
)

INSERT INTO #tempResults
SELECT 
      ID, Name, DateMadeNew 
      FROM Product 
      WHERE contains(Name, '"White Dress"')


SELECT TOP 1
    *
    FROM #tempResults
    ORDER BY DateMadeNew desc

This works really well for keywords that return only small number of records. But if I use something more generic like just 'White', it takes about 5 secs to return. I am trying to get SQL optimizer to generate a different plan for 'White' vs 'White Dress'. Currently it generates the same one, which works fast for White, but not so fast for 'White Dress'. Plan for 'White' http://docs.google.com/leaf?id=0B_KNS6cREtKUZDcxOTYyMmEtMGZjNi00MmM5LWI2NDYtM2EwZGE3YWVjMDY5&hl=en Plan for 'White Dress' http://docs.google.com/leaf?id=0B_KNS6cREtKUMTlhZWQzZDQtMjk4Ni00OTZiLWJmOGMtMmE5YzNhODAxNzZh&hl=en — Eric P, May 03 '10 at 08:40
@Eric The reason that latest plan looks better though is that there seems to be a predicate added (this_.IsDeleted =0 AND this_.Buyable=1) that reduces the number of rows to 245 before the JOIN onto the full text index. Is that the same as your original screenshot? — Martin Smith, May 03 '10 at 09:10
Oops.. Forgot to remove them. When I do - it goes from 245 to 448, but still really fast for 'white'. There are about 28K records in DB out of 700K that have 'white' word, so it doesn't need to go far to find the first 24 even ordered by DateMadeNew. — Eric P, May 03 '10 at 10:45
Yes, Basically it seems to come down to getting SQL Server to make a good cardinality estimate in advance of how many matches there will be for the full text result. I'd be interested if you find any good info on this. Just out of interest does the estimate change at all for "Dress" or "Dress White" or does it remain at an estimate of 28K for everything? — Martin Smith, May 03 '10 at 11:48
@Eric See Edit - Also how do the estimated rows tally up against the "document_count" info in sys.dm_fts_index_keywords for "White" and "Dress"? — Martin Smith, May 03 '10 at 12:28
Dress Estimated count:38,732 Actual: 41,526 Dress White Estimated count:27,811 Actual: 30 Document_Count white: 28487 dress: 41526 — Eric P, May 03 '10 at 13:45
So it looks like it is betting that 97.5% of the records matching "White" will also match "Dress" and getting it spectacularly wrong! — Martin Smith, May 03 '10 at 14:13

score 1 · Answer 6 · edited May 23 '17 at 12:13

I have better solution.

I. Let's first overview proposed solutions as they also may be used in some cases:

OPTION (HASH JOIN) - is not good as you may get error "Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN."
SELECT TOP 1 * FROM (ORIGINAL_SELECT) ORDER BY ... - is not good, when you need to use paginating results from you ORIGINAL_SELECT
sp_create_plan_guide - is not good, as to use plan_guide you have to save plan for specific sql statement, this won't work for dynamic sql statements (e.g. generated by ORM)

II. My Solution contains of two parts 1. Self join table used for Full Text search 2. Use MS SQL HASH Join Hints MSDN Join Hints

Your SQL :

SELECT TOP 1 ID, Name FROM Product WHERE contains(Name, '"White Dress"') 
ORDER BY DateMadeNew desc

Should be rewritten as :

SELECT TOP 1 p.ID, p.Name FROM Product p INNER HASH JOIN Product fts ON fts.ID = p.ID
WHERE contains(fts.Name, '"White Dress"') 
ORDER BY p.DateMadeNew desc

If you are using NHibernate with/without Castle Active Records, I've replied in post how to write interceptor to modify your query to replace INNER JOIN by INNER HASH JOIN

FreeText Query is slow - includes TOP and Order By

6 Answers6

Linked