2

I have a schema like this (simplified):

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name NOT NULL
);
CREATE INDEX users_idx
ON users
USING GIN (to_tsvector('finnish', name));

But I'm getting completely invalid results with my queries:

# select name from users where to_tsvector('finnish', name) @@ to_tsquery('lemmin');
 name 
------
(0 rows)

# select name from users where to_tsvector('finnish', name) @@ to_tsquery('lemmink');
       name                                       
--------------------
 Riitta ja Lemminki
 Riitta ja Lemminki
(2 rows)

# select name from users where name ilike 'lemmink%';
       name                  
----------------------
 Lemminkäinen Matilda
 Lemminkäinen Matias
 Lemminkäinen Kyösti
 Lemminkäinen Tuomas
(4 rows)

Another example:

# select name from users where to_tsvector('finnish', name) @@ to_tsquery('partu');
   name             
----------
 Partuuna
(1 row)

# select name from users where to_tsvector('finnish', name) @@ to_tsquery('partur');
         name                                    
------------------------
 Parturi-Kampaamo Raija
 Parturi-Kampaamo Siema
(2 rows)

I was expecting to get the bottom two results on both queries...

Using the following version:

psql (9.4.6, server 9.5.2)
WARNING: psql major version 9.4, server major version 9.5.
         Some psql features might not work.
Markus Meskanen
  • 19,939
  • 18
  • 80
  • 119
  • If `finnish` is not your default FTS config, you're supposed to give `'finnish'` (if you want to use that) as the first parameter to both `to_tsvector()` & `to_tsquery()`. -- I don't know exactly what happens when the query is tokenized differently than the vector (document), but I don't think something good can come up with this setup. – pozs Feb 15 '17 at 12:58
  • 1
    Also, for pure prefix match (where `lemmin` or `lemmink` is not a complete stem) you should use `to_tsquery('finnish', 'lemmin:*')` – pozs Feb 15 '17 at 13:02
  • Also (again :) ) stemming of person names is not really one of FTS' strengths. – pozs Feb 15 '17 at 13:06
  • @pozs Hey sorry I forgot to answer, using `lemmin:*` worked perfectly for *all* of my scenarios. If you want to post a competing answer, feel free to. – Markus Meskanen Feb 25 '17 at 12:05

1 Answers1

-1

I don't speak Finnish, but it seems expected result. FTS looks for lexemes, not for parts of words, Eg, do is not a lexemme for dog, but dog is for dogs:

t=# select  to_tsvector('english', 'Dogs eats bone') @@ to_tsquery('do');
NOTICE:  text-search query contains only stop words or doesn't contain lexemes, ignored
 ?column?
----------
 f
(1 row)

t=# select  to_tsvector('english', 'Dogs eats bone') @@ to_tsquery('dog');
 ?column?
----------
 t
(1 row)

So I believe in Parturi last i is optional ending - right?..

Update: from https://en.wiktionary.org/wiki/parturi : partur[i], partur[eita] => lexeme will be partur

Vao Tsun
  • 47,234
  • 13
  • 100
  • 132
  • Darn, is there anything similar that works for parts of words? :/ Also, what do you mean by optional ending? `parturi` as-is is the base word, it translates literally into "hairdresser". It's like searching "hairdresse" and not getting names with "hairdresser" in them – Markus Meskanen Feb 15 '17 at 11:06
  • you can use like operator or regular expressions to look for parts of words – Vao Tsun Feb 15 '17 at 11:17
  • Hmm thanks, I guess this answers my question. I'll just have to come up with something else – Markus Meskanen Feb 15 '17 at 11:21