Questions tagged [trigram]

Trigrams are a special case of the N-gram, where N is 3. They are often used in natural language processing for doing statistical analysis of texts.

Trigrams are a special case of the N-gram, where N is 3. They are often used in natural language processing for doing statistical analysis of texts.[1]

100 questions
19
votes
3 answers

PostgreSQL, trigrams and similarity

Just testing PostgreSQL 9.6.2 on my Mac and playing with Ngrams. Assuming there is a GIN trigram index on winery field. The limit for similarity (I know this is deprecated): SELECT set_limit(0.5); I'm building a trigram search on 2,3M row table. My…
alex.bour
  • 2,842
  • 9
  • 40
  • 66
10
votes
2 answers

Django max similarity (TrigramSimilarity) from ManyToManyField

I have to implement a search function which will be fault tolerant. Currently, I have the following situation: Models: class Tag(models.Model): name = models.CharField(max_length=255) class Illustration(models.Model): name =…
Lukas
  • 2,544
  • 2
  • 18
  • 33
9
votes
2 answers

ElasticSearch use "best match" of ngram terms instead of "synonym"?

Is it possible to tell ElasticSearch to use "best match" of all grams instead of using grams as synonyms? By default ElasticSearch uses grams as synonyms and returns poorly matching documents. It's better to showcase with example, let's say we have…
Alex Craft
  • 13,598
  • 11
  • 69
  • 133
9
votes
1 answer

Postgres word_similarity not comparing words

"Returns a number that indicates how similar the first string to the most similar word of the second string. The function searches in the second string a most similar word not a most similar substring. The range of the result is zero (indicating…
Cristiano Coelho
  • 1,675
  • 4
  • 27
  • 50
9
votes
2 answers

multi-column index for string match + string similarity with pg_trgm?

Given this table: foos integer id string name string type And a query like this: select * from foos where name ilike '%bar%' I can make a pg_trgm index like this to make lookups faster: CREATE INDEX ON foos USING gin (name…
John Bachir
  • 22,495
  • 29
  • 154
  • 227
8
votes
1 answer

Implementing trigram markov model

Given : and the following : For : q(runs | the, dog) = 0.5 Should this not be 1 as for q(runs | the, dog) : xi=runs , xi-2=the , xi-1=dog Probability is (wi has been swapped for xi): therefore : count(the dog runs) / count(the dog) = 1 / 1…
blue-sky
  • 51,962
  • 152
  • 427
  • 752
6
votes
0 answers

Autocomplete by most frequent words - postgres or lucene?

We're using Postgres and its fulltext feature to search for documents (posts content) in our system, and it works really well. For autocomplete we want to build index (dictionary?) with all words used in documents and search by most frequent…
5
votes
4 answers

Migration of trigram search in Rails

I have a migration: class AddGinIndexToContacts < ActiveRecord::Migration def up execute("CREATE INDEX contacts_search_idx ON contacts USING gin (first_name gin_trgm_ops, last_name gin_trgm_ops, name gin_trgm_ops)") end def down …
4
votes
1 answer

Python Pandas NLTK Extract Common Phrases (ngrams) From Text Field in Dataframe 'join() argument' Error

I have the following sample dataframe: No category problem_definition_stopwords 175 2521 ['coffee', 'maker', 'brewing', 'properly', '2', '420', '420', '420'] 211 1438 ['galley', 'work', 'table', 'stuck'] 912 2698 ['cloth',…
PineNuts0
  • 4,740
  • 21
  • 67
  • 112
4
votes
1 answer

Postgresql BTREE_GIN index with gin_trgm_ops option?

On https://www.postgresql.org/docs/current/static/pgtrgm.html it is explained how special GIN idexes with gin_trgm_ops option can be used to facilitate trigram similarity operator performance. CREATE INDEX trgm_idx ON test_trgm USING GIN (t…
zlatko
  • 596
  • 1
  • 6
  • 23
4
votes
2 answers

How to perform trigram operations in Google BigQuery?

I do use the pg_trgm module in PostgreSQL to calculate similarity between two strings using trigrams. Particularly I use: similarity(text, text) Which returns returns a number that indicates how similar the two arguments are (between 0 and 1). How…
Javier Giovannini
  • 2,302
  • 1
  • 19
  • 21
4
votes
2 answers

How to select columns by values in a row in R

I have a large data frame marking occurrences of trigrams in a string, where the strings are the rows, the trigrams are the columns, and the values mark whether an trigram occurs in a string. so something like this: strs <- c('this', 'that', 'chat',…
Amadou Kone
  • 907
  • 11
  • 21
4
votes
1 answer

postgresql not using trigram index on text column but uses it on varchar column

So basically I set up a very simple test table to test trigram and fulltext indexing capabilities in postgresql 9.1 (stock Debian stable). Here are the table and index definitions: -- Table: fulltextproba -- DROP TABLE fulltextproba; CREATE TABLE…
P.Péter
  • 1,527
  • 16
  • 39
3
votes
2 answers

Postgres Select ILIKE %text% is Slow On Large String Rows

I have a table which have only 7 columns and one of that column stores long text data for every row. Average character length for that text column data is approximately 1500 characters. And this table got 500.000 rows. When I use a select query and…
3
votes
1 answer

Why is postgres trigram word_similarity function not using a gin index?

The postgres trigram documentation states: The pg_trgm module provides GiST and GIN index operator classes that allow you to create an index over a text column for the purpose of very fast similarity searches. These index types support the…
Ulad Kasach
  • 11,558
  • 11
  • 61
  • 87
1
2 3 4 5 6 7