1

In SQL Server 2014 I'm attempting to use contains for several different terms. The query is taking an exceptionally long time and I'm curious if it is faster to separate the contains commands or to use them in the same statement.

For example, is it faster to use:

WHERE CONTAINS(text, '"term1" or "term2" or "term3"')

or

WHERE CONTAINS(text, "term1") OR CONTAINS(text, "term2") OR CONTAINS(text, "term3")

Does it make a difference using one vs the other, or am I possibly missing something more fundamental here?

Update:

I ended up running the first and it erred out after 50 minutes (possibly separate server connection issues). I ran the second option and the query completed in 19 minutes. I would still be interested if anybody had a better answer for why one may perform better than the other. Thanks.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Linguisize
  • 11
  • 1
  • 3
  • I would bet strongly on the first. I've never tried it. You can test it. – Gordon Linoff Jun 08 '17 at 20:59
  • Even I would recommend 1st, but I have also never tried it – Neha Jun 08 '17 at 21:04
  • Possible try run the first again as it is possible that the indexes were out of date. You could also check the execution plan to see ([See Method 1 of the accepted answer in this post](https://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan)) which query will perform better. – Java Devil Jun 08 '17 at 22:37

1 Answers1

0

I just noticed after doing all this that I performed this test in SqlServer 2016 - so there could possibly be performance improvements between versions

I would say that the first is the better option based on the following experiment, but note that I am basing this off a relatively small set of test data and the set up I did this with may be overly simplified, but for I think that it demonstrates in theory why the first should be better.

You can repeat this test with your own data using Microsoft Management studio by turning on the Execution plan (See Method 1 of the accepted answer in this post)

Running the two queries with only two terms gives the below execution:

Query set 2 execution

As you can see in the first query, there is 68% of the cost is for scanning the index and 32% for the table function (Contains method). And the second, as there is now two function calls to analyse - the cost of the table function is increased. And here are the timings of the queries.

-- Query 1

(296 row(s) affected)

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 73 ms.

-- Query 2

(296 row(s) affected)

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 100 ms.

If I increase the number of terms in the query you can see how this affects the execution.

Query set 2 execution

And gives the timings:

-- Query 1

(441 row(s) affected)

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 80 ms.

-- Query 2

(441 row(s) affected)

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 143 ms.

Comparing the timings, there was a 7ms increase for the first style but 43ms for the second - almost a 50% increase. The increase between the two sets also shows that the performance should be better for the first query style will scale better as you increase the number of terms.

Java Devil
  • 10,629
  • 7
  • 33
  • 48
  • I tried running query 1 again today and just let it do its thing. Query 1 took almost two hours at 01:40:29 Query 2 took 19 minutes. Both queries returned 13,000,000 rows from the original 200,000,000. Everything you said make sense and the first one seems like it ought to be faster, but that's how it ended up for me. I'm not entirely sure what the takeaway here is, but there it is. Thanks. – Linguisize Jun 09 '17 at 17:34