3

I'm using SQLite's FTS module to try to implement a full text search with windowing (or paging as some might call it) of results as well as highlighting of the search hits within each window. As far as I can tell it's straightforward to have either windowing or highlighting of search hits, but not both.

To implement windowing, I run the FTS and retrieve and store the docid of every hit in a temporary table, along the lines of:

CREATE VIRTUAL TABLE FullTextIndex USING FTS4 ( fulltext TEXT );
...
CREATE TEMP TABLE SearchResults ( id INTEGER );
INSERT INTO SearchResults
   SELECT docid FROM FullTextIndex WHERE fullText MATCH 'mySearchExpression';

When I'm ready to retrieve the actual full text based on the windowing criteria, I do something like this:

SELECT FullTextIndex.fullText FROM FullTextIndex
INNER JOIN SearchResults
ON FullTextIndex.docid = SearchResults.id
WHERE *windowing criteria applied to SearchResults*;

That part works great and is very efficient because it's a rowid match.

However, at this point I cannot make use of SQLite's snippet function (or any other FTS auxiliary function) in order to highlight my search hits because I'm not using the text index in the query. Therefore I tried this:

SELECT snippet(FullTextIndex) FROM FullTextIndex
WHERE FullTextIndex.docID IN 
   (SELECT id FROM SearchResults WHERE *windowing criteria*) 
    AND FullTextIndex.fullText MATCH 'mySearchExpression';

It works, but it takes almost exactly as long as the original FTS because it is apparently running the FTS first, without applying the narrowing criteria based on SearchResults (and I suppose fundamentally it can't because of the way FTS indexing works).

My best solution so far is to make a second temporary table - a temporary full text index containing only the search hits for each window. In essence this:

CREATE VIRTUAL TABLE temp.TextResultsWindow USING FTS4 (id INTEGER, fullText text);
INSERT INTO temp.TextResultsWindow
    SELECT FullTextIndex.docid, FullTextIndex.fullText
    FROM FullTextIndex
    WHERE FullTextIndex.docid IN 
     (SELECT id from SearchResults WHERE *windowing criteria*);

SELECT snippet(temp.TextResultsWindow.TextResultsWindow) 
FROM temp.TextResultsWindow
WHERE temp.TextResultsWindow.fullText 
MATCH 'mySearchExpression';

This does work, and it seems to be relatively fast for modest window sizes, but it seems terribly inefficient to be re-indexing a window of search hits just to have highlighting. If the volume of text in a particular window happened to be very large I could also easily see the window retrieval taking longer than the original FTS, just because it has to spend so much time re-indexing for each window just to use the snippet function. So while it works academically I don't think it'll be very good in practice.

Can anyone suggest a better approach? Plenty of applications have full text search with both windowing and highlighting. Do I have no choice but to just apply a manual scan and highlighting function to each full text result that I retrieve?

Thanks a lot.

UPDATE: For what it's worth, there's a section in the SQLite documentation that discusses FTE3/4 and snippet, and paging/windowing. They're suggesting using LIMIT and OFFSET to do the windowing, so the scenario they're describing would indeed result in the full text search being re-run on the whole index for every window. I guess if the SQLite developers don't have a more efficient way of doing this it probably doesn't exist, but still would appreciate any other thoughts folks might have.

Emperor Eto
  • 2,456
  • 2
  • 18
  • 32

0 Answers0