16

I'm trying to wrap my head around how to search for something that appears in the middle of a word / expression - something like searching for "LIKE %book% " - but in SQL Server (2005) full text catalog.

How can I do that? It almost appears as if both CONTAINS and FREETEXT really don't support wildcard at the beginning of a search expression - can that really be?

I would have imagined that FREETEXT(*, "book") would find anything with "book" inside, including "rebooked" or something like that.

p.campbell
  • 98,673
  • 67
  • 256
  • 322
  • Check this link for a helper class that is very useful for FTS on Sql Server 2005: [http://ewbi.blogs.com/develops/2007/05/normalizing_sql.html](http://ewbi.blogs.com/develops/2007/05/normalizing_sql.html) – Eric Z Beard Nov 17 '08 at 13:58

6 Answers6

16

unfortunately CONTAINS only supports prefix wildcards:

CONTAINS(*, '"book*"')
Lance Fisher
  • 25,684
  • 22
  • 96
  • 122
Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
7

SQL Server Full Text Search is based on tokenizing text into words. There is no smaller unit as a word, so the smallest things you can look for are words.

You can use prefix searches to look for matches that start with certain characters, which is possible because word lists are kept in alphabetical order and all the Server has to do is scan through the list to find matches.

To do what you want a query with a LIKE '%book%' clause would probably be just as fast (or slow).

Tomalak
  • 332,285
  • 67
  • 532
  • 628
3

If you want to do some serious full text searching then I would (and have) use Lucene.Net. MS SQL Full Text search never seems to work that well for anything other than the basics.

Craig
  • 36,306
  • 34
  • 114
  • 197
1

Full text has a table that lists all the words the engine has found. It should have orders-of-magnitude less rows than your full-text-indexed table. You could select from that table " where field like '%book%' " to get all the words that have 'book' in them. Then use that list to write a fulltext query. Its cumbersome, but it would work, and it would be ok in the speed department. HOWEVER, ultimately you are using fulltext wrong when you are doing this. It might actually be better to educate the source of these feature requests about what fulltext is doing. You want them to understand what it WANTS to do, so they can get high value from fulltext. Example, only use wild cards at the end of a word, which means think of the words in an ordered list.

jerry
  • 1,817
  • 1
  • 11
  • 6
1

Here's a suggestion that is a workaround for that wildcard limitation. You create a computed column that contains the same content but in reverse as the column(s) you are searching.

If, for example, you are searching on a column named 'ProductTitle', then create a column named ProductsRev. Then update that field's 'Computed Column Specification' value to be:

(reverse([ProductTitle]))

Include the 'ProductsRev' column in your search and you should now be able to return results that support a wildcard at the beginning of the word. Good luck!!

Jake H.
  • 155
  • 1
  • 3
  • 1
    that won't find "rebbooked" as in reverse it is "dekoober" and "koob*" still doesn't match. – jerry May 22 '12 at 19:00
  • This is creative, but as the other comment suggests, it only helps with words that end in the target, not in words that have the target in the middle. Also, it feels so much like a data/code smell that I think implementers are better off using regular 'LIKE %blah%' or switching DB engines. – Carl G Dec 11 '12 at 22:07
0

why don't program an assembly in C# to compute all the non repeated sufixes. For example if you have the Text "eat the red meat" you can store in a field "eat at t the he e red ed d meat" (note that is not necesary to add eat at and t again) ind then in this field use full text search. A function for doing that can easily written in Csharp

x) I know it seems od... it's a workarround x) I know I'm adding overhead in the insert / update .... only justified if this overhead is insignificant besides the improvement in the search function x) I know there is also an overhead in the size of the stored data.

But I'm pretty conffident that will be quite fast

Qsebas
  • 458
  • 3
  • 15