3

I have a SQLite FTS5 virtual table and am trying to highlight text in my prefix query results. I am aware of the highlight() and snippet() auxiliary functions, however they don't seem to support exactly what I am trying to do. If my data looks like:

fts.my_data
-----------
John
Mike
Bill
Jane

and I want to query using a prefix match such as

select * from fts where fts match 'j*';

The highlight and snippet functions (assuming <b>...</b> tags) will return

<b>John</b>
<b>Jane</b>

But I only want to highlight the exact part of the prefix that was matched, before the wildcard:

<b>J</b>ohn
<b>J</b>ane

There does not seem to be any way to do this using the existing FTS5 auxiliary functions. I realize FTS5 offers an API so that you can create your own auxiliary functions. I might also be able to implement the solution in application code (I am using Swift), although I suspect this problem has the potential for a lot of issues trying to implement in application code (for example, how to handle stemming). Does anyone know if what I am trying to do is actually possible using the existing highlight and/or snippet functions before I go to the trouble of implementing my own solution? If so, could you explain how?

Also, I have observed several other existing apps (Contacts+ for example) offer this capability so I know it is possible somehow, and am also wondering how they do it if anyone knows how.

John Cleveland
  • 488
  • 5
  • 17
  • The [experimental trigram tokenizer](https://www.sqlite.org/fts5.html#the_experimental_trigram_tokenizer) matches on just the provided content. Experimenting locally, it highlights in the way that you asked for. t wouldn't work for the query "match 'J*'", because it is only able to match phrases that are 3 or more unicode characters. – Max Shenfield Feb 02 '21 at 07:29
  • Thank you for help, the trigram tokenizer does appear to do what I want. Using sqlite3 from the command line, I was able to create a virtual table and run some test queries and highlight and snippet both seem to be giving me what I am looking for. I am now stuck on when I query from Swift I am getting a "no such tokenizer: trigram" error. I checked "PRAGMA compile_options;" from inside my running iOS app and the option ENABLE_FTS5 is in the results, also reports SQLite version "3.32.3", but something is still missing in the SQLite being bundled with iOS to enable the trigram tokenizer. – John Cleveland Feb 02 '21 at 23:46
  • Bummer, the trigram tokenizer was released two minor versions later - in [3.34.0](https://www.sqlite.org/releaselog/3_34_0.html). One of the SQLite developers came up with it in response to [a recent forum post from Simon Willison](https://sqlite.org/forum/forumpost/ca90da691a). It's possible to bundle your own version of SQLite with your app (for example, [with SQLite.swift](https://github.com/stephencelis/SQLite.swift/blob/master/Documentation/Index.md#requiring-a-specific-version-of-sqlite)). I'd guess that writing your own code to do highlighting would be easier, however. – Max Shenfield Feb 03 '21 at 02:30
  • Thanks again for your help. It was not trivial, but I did manage to compile 3.34.0 for iOS using XCode, and confirmed that I was using that version in my Swift iOS app. However, the trigram tokenizer has to be built from source as a dynamic library and loaded at runtime. Apple does not allow you to run dynamically linked libraries on iOS (they only allow system dynamic libraries). So, not sure if this is even possible: compile my own version of SQLite and load the trigram extension at runtime on iOS. I'm trying to figure out now how to compile ftstri as a static library. – John Cleveland Feb 03 '21 at 08:30
  • So I've been staring at the computer for way too long. As you stated above, the trigram tokenizer was released in SQLite 3.34, confirmed in the [Release Notes](http://www3.sqlite.org/releaselog/3_34_0.html). So theoretically I should be able to compile 3.34.0 myself without having to load the trigram extension separately. – John Cleveland Feb 03 '21 at 08:45
  • The trigram tokenize looks like it's loaded with the other built-in tokenizers in the [fts5 tokenizer source file](https://www3.sqlite.org/cgi/src/file?name=ext/fts5/fts5_tokenize.c&ci=trunk) (search for "sqlite3Fts5TokenizerInit"). Without more info on what your running into, I'd guess that XCode isn't [compiling in FTS5](https://www.sqlite.org/fts5.html#compiling_and_using_fts5) when its building SQLite. – Max Shenfield Feb 03 '21 at 18:24

1 Answers1

4

For anyone else looking for a resolution to this issue, I was able to figure out how to solve this issue.

The trigram tokenizer was added as one of the built-in tokenizers in SQLite 3.34.0 (released December 1, 2020).

In my case, I was deploying to iOS, which at this current time, only has SQLite 3.32 bundled with it by default, as confirmed by this wiki page. So I was able to download the SQLite source code and add it as a project to my XCode workspace as a "Static Library" and reference the resulting .a static library file from within my app's project. I also had to set the appropriate C Flags in the XCode compiler options for the SQLite project to get everything to work correctly. But I am now able to distribute my own supplied version of SQLite (3.34.1), compiled with my specific options, with FTS5 and the trigram tokenizer.

John Cleveland
  • 488
  • 5
  • 17