I am using ES with the river plugin, as I am using a couchDB and I am trying to use nGrams for my queries. I have done basically everything I need except for the fact that when someone inputs a space, the query doesn't work properly. That is because ES tokenizes every element of the query splitting it by the space.
Here is what I need to do:
Query for part of a text in a string:
query: "Hello Wor" response: "Hello World, Hello Word" / excluded "Hello, World, Word"
Sort results by criteria I specify;
Case insensitive.
Here is what I have done, following this question: How to search for a part of a word with ElasticSearch
curl -X PUT 'localhost:9200/_river/myDB/_meta' -d '
{
"type" : "couchdb",
"couchdb" : {
"host" : "localhost",
"port" : 5984,
"db" : "myDB",
"filter" : null
},
"index" : {
"index" : "myDB",
"type" : "myDB",
"bulk_size" : "100",
"bulk_timeout" : "10ms",
"analysis" : {
"index_analyzer" : {
"my_index_analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["lowercase", "mynGram"]
}
},
"search_analyzer" : {
"my_search_analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "mynGram"]
}
},
"filter" : {
"mynGram" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 50
}
}
}
}
}
'
I will then add a mapping for the sorting:
curl -s -XGET 'localhost:9200/myDB/myDB/_mapping'
{
"sorting": {
"Title": {
"fields": {
"Title": {
"type": "string"
},
"untouched": {
"include_in_all": false,
"index": "not_analyzed",
"type": "string"
}
},
"type": "multi_field"
},
"Year": {
"fields": {
"Year": {
"type": "string"
},
"untouched": {
"include_in_all": false,
"index": "not_analyzed",
"type": "string"
}
},
"type": "multi_field"
}
}
}
}'
I have added all the info I use just to be complete. Anyway, with this setup, that I suppose should work, whenever I try to get some results, the space is still used for splitting my query, example:
http://localhost:9200/myDB/myDB/_search?q=Title:(Hello%20Wor)&pretty=true
Returns anything that contains "Hello" and "Wor" (I normally don't use the parentheses, but I have seen them in an example, still the results seem very similar).
Any help is truly appreciated as this is bugging me quite a lot.
UPDATE: At the end, I realized that I didn't need a nGram. A normal index would do; simply replacing the whitespace of the query with ' AND ' would do the job.
Example:
Query: "Hello World" ---> Replaced as "(*Hello And World*)"