"shingle" is a synonym for word-nGrams in computational linguistics and probability
Questions tagged [shingles]
32 questions
5
votes
0 answers
How to use synonym filter and shingle filter together?
I'm trying to use a shingle filter with synonym filter (see code below). This gives me the output:
enforced implemented
implemented for
for examination
examination testing
The words enforced and implemented are occurring together same as testing…

ralph
- 153
- 2
- 9
4
votes
2 answers
Fastest Way to Shingle from Pandas Column
I need the fastest possible way to shingle strings from a data frame and then create a master list.
Given the following data frame:
import pandas as pd
d=['Hello', 'Helloworld']
f=pd.DataFrame({'strings':d})
f
strings
0 Hello
1 …

Dance Party2
- 7,214
- 17
- 59
- 106
3
votes
1 answer
ElasticSearch: shingles - match phrase if field contains exact shingle token
I am newbie with elastic search and have trouble with following scenario:
Let's consider I have 2 documents which contains only one field "text"
"text" : "token1 token4"
"text" : "token2 token3"
"text" : "token4 token5"
And by following…

Vitaly
- 63
- 8
3
votes
1 answer
Solr Shingle Is Not Visible In Debug Query
I am trying to use Solr to find exact matches on categories in a user search (e.g. "skinny jeans" in "blue skinny jeans"). I am using the following type definition:

mils
- 1,878
- 2
- 21
- 42
2
votes
0 answers
Generating shingles with synonyms in Elasticsearch
I have a file of alternate spellings for the terms in my index. I want to produce bigrams containing those alternate spellings for particular terms. For example, I have biriyani, biryani, briyani in my alternate spellings csv file and my field…

Yawan Gupta
- 21
- 1
2
votes
0 answers
Elasticsearch : Search text by skipping words between in shingles
When I search for a text "submarine sinks ships" I want the search ranking to prioritize the matches for "submarine ships".
But in my index of shingles size 2, the text will be indexed as {'submarine sinks', 'sinks ships'} but will not be indexing…

Raja Rajan
- 81
- 6
2
votes
0 answers
Can Elasticsearch return a 'successful' fuzzed shingle?
TL;DR
Is it possible to have Elasticsearch return the matched input-shingle alongside the matched document in a fuzzed query?
Example:
Lets say I have a shingle:
"fulltext_shingle_filter":{
"type": "shingle",
"min_shingle_size": 2,
…

MoorzTech
- 380
- 4
- 17
1
vote
0 answers
ShingleFilter is not working for maxShingleSize=3
Enviornment ==> solr - solr-8.9.0, java version "11.0.12" 2021-07-20 LTS
Following .csv file is indexed in solr
books_id,cat,name
0553573403,book,Game Thrones Clash
0553573404,book,GameThrones…

user595014
- 114
- 3
- 8
- 20
1
vote
1 answer
How to do partial phrase matching with boosting in Solr
We applied boosting and phrase boosting as below:
https://localhost:8983/solr/app_index/select?bq=(Title:"userinput")^20+
+(Desc:"userinput")^10&pf=(Title:"userinput")^20+(Desc:"userinput")^10
…

Thamizh
- 29
- 6
1
vote
2 answers
elasticsearch synonyms & shingle conflict
Let me jump straight to the code.
PUT /test_1
{
"settings": {
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"university of tokyo => university_of_tokyo, u_tokyo",
…

Kaushik J
- 962
- 7
- 17
1
vote
0 answers
Stacking the bars based on attribute variable after equal.count function
I have got three variables; employee, PM and site in my table sitereview. Imported data to R.
sitereview<-read.csv(file.choose(),header=TRUE)
Sample data pic
Classified data into 6 equal intervals using equal.count function from library(lattice).…
1
vote
1 answer
Comparing Shingles for Near-Duplicate Detection
I'm working on shingling code to compare near-duplicates. I'm getting a little stuck on the compare code. This is my rough attempt so far.
//shingles are already hashed integers and I'm working on the evaluation to true via the float similar…

Joshua Hedges
- 307
- 4
- 16
1
vote
0 answers
solr shingleFilterFactory not working
Recently I migrated from solr 4 to 6.
In solr 4 shinglefilterfactory is working correctly my configration is

amandeep singh
- 21
- 2
1
vote
1 answer
Elasticsearch shingle token filter not working
I'm trying this on a local 1.7.5 elasticsearch installation
http://localhost:9200/_analyze?filter=shingle&tokenizer=keyword&text=alkis stack
I see this
{
"tokens":[
{
"token":"alkis stack",
"start_offset":0,
…

Alkis Kalogeris
- 17,044
- 15
- 59
- 113
1
vote
2 answers
How to concatenate multiple solr tokens into one
In Solr, when merging tokens using solr.ShingleFilterFactory, it may generate multiple Shingles depending on the min/maxShingleSize and tokens to merged. Due to this, search fails. How can I merge multiple tokens into one so that my search works.…

ConnectedCars
- 31
- 4