1

Could you help me, please figure out why t-sql expression following:

  DECLARE @SearchWord varchar(max)
SET @SearchWord = '"I went to primary school in London "'
SELECT * FROM sys.dm_fts_parser('FormsOf(INFLECTIONAL, '+ @SearchWord + ')', 1033, 0, 0) 
where display_term in 
( SELECT display_term FROM sys.dm_fts_parser('FORMSOF(INFLECTIONAL, "go to school")', 1033, null, 0) )

returns

enter image description here

while

DECLARE @SearchWord varchar(max)
SET @SearchWord = '"I went. to primary school in London "'
SELECT * FROM sys.dm_fts_parser('FormsOf(INFLECTIONAL, '+ @SearchWord + ')', 1033, 0, 0) 
where display_term in 
( SELECT display_term FROM sys.dm_fts_parser('FORMSOF(INFLECTIONAL, "go to school")', 1033, null, 0) )

returns

enter image description here

i.e. when a single dot is added somewhere in the searchstring, the corresponding occurrences shift by 8 positions? Is there something wrong with dot or maybe my t-sql expression? Thanks in advance!

khurshed_nosirov
  • 248
  • 4
  • 20

2 Answers2

2

First, the documentation says that occurrence indicates order, not position. That means that the values can be relative, not absolute, but still correctly show the order.

Next, by observation the first digit of the number indicates the zero-based sentence number (except for the first sentence, where there is no leading zero). A 'dot' is actually a full stop, which ends a sentence in English, so it isn't surprising that there is something significant about it. Look at the output from this query, and you'll see the 'end of sentence' special term:

DECLARE @SearchWord varchar(max) = N'"I went. to primary school in London. it was a nice school. to go there was fun"'
SELECT * 
FROM sys.dm_fts_parser('FormsOf(INFLECTIONAL, '+ @SearchWord + ')', 1033, 0, 0)

If you then look at your query with the longer sentence...

DECLARE @SearchWord varchar(max) = N'"I went. to primary school in London. it was a nice school. to go there was fun"'
SELECT * 
FROM sys.dm_fts_parser('FormsOf(INFLECTIONAL, '+ @SearchWord + ')', 1033, 0, 0)
where display_term in 
(SELECT display_term FROM sys.dm_fts_parser('FORMSOF(INFLECTIONAL, "go to school")', 1033, null, 0))

... you can see that for sentences 1 and 2 the occurrence is indeed also the word position, but for sentences 3 and 4 it isn't. I have no idea why this happens and there is nothing in the documentation to explain it, but since the documentation does not say that occurrence is the same thing as the position, it isn't totally surprising.

These questions may be interesting too:

Community
  • 1
  • 1
Pondlife
  • 15,992
  • 6
  • 37
  • 51
  • I want to use results of this query to find searchword positions in results of full-text search in order to highlight them. But analyzing your comment it turns out that it's not a good idea.. Or is there any way to somehow cheat the function to make it return position, not order? At least, could I have a list of all of characters(as '.') that make occurrence not be equal to position? – khurshed_nosirov Feb 14 '13 at 11:10
  • 1
    According to MSDN and Google FTS does not provide any way to get the position of a match. There is no real documentation about how FTS behaves internally, so making too many assumptions would be risky. The questions I linked to suggested using `dm_fts_parser` to get the matches, then searching the string again (with `PATINDEX()` or whatever) to find their position. Or if you need this feature badly, you could look for an alternative FTS solution that [does provide position](http://stackoverflow.com/questions/3332538/finding-the-start-and-end-of-a-match-with-lucene); Lucene is often mentioned. – Pondlife Feb 14 '13 at 13:47
0

As @Pondlife suggested to use PATINDEX() to find positions(not occurrences) I was able to overcome the mentioned issue with dm_fts_parser. The below t-sql returns exact position and length of all of match forms(INFLECTIONAL) in specific text no matter @SearchWord contains dot or not:

DECLARE @SearchWord nvarchar(max)

SET @SearchWord = N'"I went. to primary school in London "'

SELECT distinct y.pos,y.lgth from 
(
SELECT w.*,
PATINDEX(N'%[^a-z]' + w.Display_Term + N'[^a-z]%',@SearchWord) as pos, LEN(w.display_term) as lgth
 FROM sys.dm_fts_parser(N'FormsOf(INFLECTIONAL, '+ @SearchWord + ')', 1033, 0, 0) w 
where display_term in 
( SELECT display_term FROM sys.dm_fts_parser('FORMSOF(INFLECTIONAL, "go to school")', 1033, 0, 0) ) and 
PATINDEX(N'%[^a-z]' + w.Display_Term + N'[^a-z]%',
@SearchWord)<>0
) y

and it returns result set below:

enter image description here

khurshed_nosirov
  • 248
  • 4
  • 20