Questions tagged [shingles]

"shingle" is a synonym for word-nGrams in computational linguistics and probability

32 questions
5
votes
0 answers

How to use synonym filter and shingle filter together?

I'm trying to use a shingle filter with synonym filter (see code below). This gives me the output: enforced implemented implemented for for examination examination testing The words enforced and implemented are occurring together same as testing…
ralph
  • 153
  • 2
  • 9
4
votes
2 answers

Fastest Way to Shingle from Pandas Column

I need the fastest possible way to shingle strings from a data frame and then create a master list. Given the following data frame: import pandas as pd d=['Hello', 'Helloworld'] f=pd.DataFrame({'strings':d}) f strings 0 Hello 1 …
Dance Party2
  • 7,214
  • 17
  • 59
  • 106
3
votes
1 answer

ElasticSearch: shingles - match phrase if field contains exact shingle token

I am newbie with elastic search and have trouble with following scenario: Let's consider I have 2 documents which contains only one field "text" "text" : "token1 token4" "text" : "token2 token3" "text" : "token4 token5" And by following…
Vitaly
  • 63
  • 8
3
votes
1 answer

Solr Shingle Is Not Visible In Debug Query

I am trying to use Solr to find exact matches on categories in a user search (e.g. "skinny jeans" in "blue skinny jeans"). I am using the following type definition:
mils
  • 1,878
  • 2
  • 21
  • 42
2
votes
0 answers

Generating shingles with synonyms in Elasticsearch

I have a file of alternate spellings for the terms in my index. I want to produce bigrams containing those alternate spellings for particular terms. For example, I have biriyani, biryani, briyani in my alternate spellings csv file and my field…
2
votes
0 answers

Elasticsearch : Search text by skipping words between in shingles

When I search for a text "submarine sinks ships" I want the search ranking to prioritize the matches for "submarine ships". But in my index of shingles size 2, the text will be indexed as {'submarine sinks', 'sinks ships'} but will not be indexing…
Raja Rajan
  • 81
  • 6
2
votes
0 answers

Can Elasticsearch return a 'successful' fuzzed shingle?

TL;DR Is it possible to have Elasticsearch return the matched input-shingle alongside the matched document in a fuzzed query? Example: Lets say I have a shingle: "fulltext_shingle_filter":{ "type": "shingle", "min_shingle_size": 2, …
MoorzTech
  • 380
  • 4
  • 17
1
vote
0 answers

ShingleFilter is not working for maxShingleSize=3

Enviornment ==> solr - solr-8.9.0, java version "11.0.12" 2021-07-20 LTS Following .csv file is indexed in solr books_id,cat,name 0553573403,book,Game Thrones Clash 0553573404,book,GameThrones…
user595014
  • 114
  • 3
  • 8
  • 20
1
vote
1 answer

How to do partial phrase matching with boosting in Solr

We applied boosting and phrase boosting as below: https://localhost:8983/solr/app_index/select?bq=(Title:"userinput")^20+ +(Desc:"userinput")^10&pf=(Title:"userinput")^20+(Desc:"userinput")^10 …
Thamizh
  • 29
  • 6
1
vote
2 answers

elasticsearch synonyms & shingle conflict

Let me jump straight to the code. PUT /test_1 { "settings": { "analysis": { "filter": { "synonym": { "type": "synonym", "synonyms": [ "university of tokyo => university_of_tokyo, u_tokyo", …
Kaushik J
  • 962
  • 7
  • 17
1
vote
0 answers

Stacking the bars based on attribute variable after equal.count function

I have got three variables; employee, PM and site in my table sitereview. Imported data to R. sitereview<-read.csv(file.choose(),header=TRUE) Sample data pic Classified data into 6 equal intervals using equal.count function from library(lattice).…
1
vote
1 answer

Comparing Shingles for Near-Duplicate Detection

I'm working on shingling code to compare near-duplicates. I'm getting a little stuck on the compare code. This is my rough attempt so far. //shingles are already hashed integers and I'm working on the evaluation to true via the float similar…
Joshua Hedges
  • 307
  • 4
  • 16
1
vote
0 answers

solr shingleFilterFactory not working

Recently I migrated from solr 4 to 6. In solr 4 shinglefilterfactory is working correctly my configration is
1
vote
1 answer

Elasticsearch shingle token filter not working

I'm trying this on a local 1.7.5 elasticsearch installation http://localhost:9200/_analyze?filter=shingle&tokenizer=keyword&text=alkis stack I see this { "tokens":[ { "token":"alkis stack", "start_offset":0, …
Alkis Kalogeris
  • 17,044
  • 15
  • 59
  • 113
1
vote
2 answers

How to concatenate multiple solr tokens into one

In Solr, when merging tokens using solr.ShingleFilterFactory, it may generate multiple Shingles depending on the min/maxShingleSize and tokens to merged. Due to this, search fails. How can I merge multiple tokens into one so that my search works.…
1
2 3