1

I would like broad guidelines before hitting the details, and thus as brief as I can on two issues (this might be far too little info):

  1. Supplier has more than one address. Address is made up of fields. Address line 1 and 2 is free text. The rest are keys to master data tables that has FK and name. Id to country. Id to province. ID to municipality. ID to city. ID to suburb. I would like to employ FTS on address line 1 and 2 and also all the master data table name columns so that user can find suppliers whose address match what they capture. This is thus across various master data tables. Note that a province or suburb etc is not only single word, e.g. meadow park.
  2. Supplier provides many goods and services. These goods and services are a 4 level hierarchy (UNSPSC) of parent and child items. The goods or service is at the lowest level of the 4 level hierarchy, but hits on the higher levels should be returned as well. Supplier linked to lowest items of hierarchy. Would like to employ FTS to find supplier who provides goods and services across the 4 level hierarchy.

The idea is this to find matches, return suppliers, show rank, show where it hit. If I'm unable to show the hit, the result makes less sense, e.g. search for the word "car" will hit on care, cars, cardiovascular, cards etc. User can type in more than one word, e.g. "car service".

Should the FTS indexes be on only the text fields on the master data tables, and my select thus inner join on FTS fields? Should I create views and indexes on those? How do I show the hit words?

japieson
  • 21
  • 2

1 Answers1

1

Should the FTS indexes be on only the text fields on the master data tables...?

When you have fields across multiple tables that need to be searched in a single query and then ranked, the best practice is to combine those fields into a single field through an ETL process. I recently posted an answer where I explained the benefits of this approach:

Why combine them into 1 table? This approach results in better ranking than if you were to apply full text indexes to each existing table. The former solution produces a single rank whereas the latter will produce a different rank for each table and there is no accurate way to resolve multiple ranks (which are based on completely different scales) into 1 rank. ...

How can you combine them into 1 table? You will need some sort of ETL process which either runs on a schedule (which may be easier to implement but will result in lag time where your full text index is out of sync with the master tables) or gets run on demand whenever your master tables are modified (either via triggers or by hooking into an event in your data layer).

How do I show the hit words?

Unfortunately SQL Server Full Text does not have a feature that extracts or highlights the words/phrases that were matched during the search. The answers to this question have tips on how to roll your own solution. There's also a 3rd party product called ThinkHighlight which is a CLR assembly that helps with highlighting (I've never used it so I can't vouch for it).

...search for the word "car" will hit on care, cars, cardiovascular, cards etc...

You didn't explicitly ask about this but you should be aware that by default "car" will not match "care", etc. What you're looking to do is a wildcard search. Your full text query will need to use an asterisk and should look something like this: SELECT * FROM CONTAINSTABLE(MyTable, *, '"car*"') Be aware that wildcards are only available when using CONTAINS/CONTAINSTABLE (boolean searches), not FREETEXT/FREETEXTTABLE (natural language searches). Based on how you describe your use case, it sounds like you will need to modify your user's search string to add the wildcards. You'll need to do this anyway if you use CONTAINS/CONTAINSTABLE in order to add the boolean operators and quotes (ex: User types car service. You change it to "car*" AND "service*".)

Community
  • 1
  • 1
Keith
  • 20,636
  • 11
  • 84
  • 125