1

So I have this object model:

string Name; // name of the person
int    Age; // age of the person
string CreatedBy; // operator who created person

My query sounds like this: all documents WHERE Age > 40 AND CreatedBy == 'callum' AND Name contains 'll'

CreatedBy is a necessary, scope of control.

Age is also a necessary (but isn't a security issue)

Name is where it can get fuzzy, because that is what the user is querying. Akin to sort of contains

The query below works for the first two parts:

"query": {
     "bool": {
         "must": [
            {
                "range": {
                   "age": {
                      "gt": 40
                   }
                }
            },
            {
                "match": {
                   "createdBy": "Callum"
                }   
            }
         ]
     }
   }

I tried adding a multi_match because ultimately it maybe a search across Name, Address and other bits of information. I couldn't make sense of where to fit it in.

In my, nested queries would be useful. So first filter out all irrelevant users, then filter out irrelevant ages. Then do some fuzzier matching on relevant fields.

Callum Linington
  • 14,213
  • 12
  • 75
  • 154

2 Answers2

0

So, the answer to this isn't straightforward.

First of all you need to create an Analyser for Compound Words.

So in the .NET client it looks like:

this.elasticClient.CreateIndex("customer", p => p
    .Settings(s => s
        .Analysis(a => a
            .TokenFilters(t => t
                .NGram("bigrams_filter", ng => ng
                    .MaxGram(2)
                    .MinGram(2)))
        .Analyzers(al => al
            .Custom("bigrams", l => l
                .Tokenizer("standard")
                .Filters("lowercase", "bigrams_filter"))))));

this.elasticClient.Map<Person>(m => m
    .Properties(props => props
    .String(s => s
        .Name(p => p.Name)
        .Index(FieldIndexOption.Analyzed)
        .Analyzer("bigrams"))
    .String(s => s
        .Name(p => p.CreatedBy)
        .NotAnalyzed())
    .Number(n => n
        .Name(p => p.Age))));

Which is a sort of direct translation of the the first link provided. This now means that all names will be broken into their bigram representation:

Callum

  1. ca
  2. al
  3. ll
  4. lu
  5. um

Then you need the actual query to take advantage of this. Now this is bit I like, because we've set up that index on the name column, it means that all term queries can have partial words in them, so take this for example (Sense query):

GET customer/_search
{
    "query": {
        "filtered": {
           "query": {
            "multi_match": {
               "query": "ll",
               "fields": ["name"]
            }
           },
           "filter": {
               "bool": {
                       "must": [
                          {
                         "range": {
                            "age": {
                               "gt": 40
                            }
                         }
                      },
                      {
                         "match": {
                            "createdBy": "Callum"
                         }
                      }
                   ]
               }
           }
        }
    }
}

Here, we have a filtered query. So the query is always the first to be run (can't find documentation yet to cite that, but I have read it), and this will be the partial terms match. Then we simply filter - which is done after the query - to get the subset of results we need.

Because the ngrams analyser is only set on name that is the only column that will be partially matched against. So CreatedBy won't and thus we get our security around the results.

Callum Linington
  • 14,213
  • 12
  • 75
  • 154
  • There is one issue, I dont think there is any `match` filter! It is only query, and cannot be used within the filter context... You can use `term` filter. – Anirudh Modi Jun 24 '16 at 11:13
  • this worked by the way, test and clarified. This is for version 2.x right, [docs](https://www.elastic.co/guide/en/elasticsearch/guide/2.x/match-query.html) – Callum Linington Jun 24 '16 at 11:22
  • and also `the query is always the first to be run (can't find documentation yet to cite that, but I have read it)` it is just the reverse, it will first filter out the document and then apply query on it.. https://www.elastic.co/blog/found-optimizing-elasticsearch-searches https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_filter_strategy There wont be any advantage of using `filter` if elastic would have first performed query on 1000k document and then apply `filter`... – Anirudh Modi Jun 24 '16 at 11:22
  • But in 2.0.0-beta1. `filtered` is deprecated! So you must not be using elastic 2.x! – Anirudh Modi Jun 24 '16 at 11:24
  • this is weird because even in the doc, it is mentioned that you can use `match` within `filter`, `match` queries are high level full text query! – Anirudh Modi Jun 24 '16 at 11:26
  • to be honest with you I can't make heads or tails of what is in the API, the docs aren't very good at all – Callum Linington Jun 24 '16 at 11:30
  • ya i know... The docs after 2.x is even more weirder. But i am pretty sure in 1.7 or less, there was no `match` query in `filter`, and the `match` was always supposed to be a `query` context thing not a `filter`, probably I will find a link, to clear some air out. – Anirudh Modi Jun 24 '16 at 11:32
  • The deprecation stuff isn't a problem, the fact they marked it deprecated means very little until the next major release - the likeliness of this project being bumped to the next release is slim... – Callum Linington Jun 24 '16 at 13:20
-1

Basically what you can do is put the query into two blocks:

"query": {
         "filter":{
            "bool":
            {
                "must":[
                    {
                        "range": {
                           "age": {
                              "gt": 40
                           }
                        }
                    }
                ]
            }
          },
         "query":{
            "bool": {
                "must": [
                    {
                      "multi_match" : {
                        "query":    "ll", 
                        "fields": [ "createdBy", "Address","Name" ] ,
                        "fuzziness":2
                      }
                    }             
                ]
            }
          }
       }

What you can do is in filter you can use condtions to filter things out, on then with the filtered data you can apply you multi-match query. The main reason why I included age in filter is because you dont need to perform any kind of free text search, you just need to check with a static value, you can include more conditions within the must block of filter.

You can also look into this article, which might give you some overview.

https://googleweblight.com/?lite_url=https://www.elastic.co/blog/found-optimizing-elasticsearch-searches&ei=EBaRAJDx&lc=en-IN&s=1&m=75&host=www.google.co.in&ts=1465153335&sig=APY536wHUUfGEjoafiVIzGx2H77aieiymw

Hope it helps!

Anirudh Modi
  • 1,809
  • 12
  • 9
  • I think you missed the fact that the `multi_match` should search for a string containing `ll` – Callum Linington Jun 23 '16 at 15:31
  • So, did it solved what you asked for? You can see more about `multi_match` in this link https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html – Anirudh Modi Jun 23 '16 at 15:35
  • no, it didn't, but I am a lot closer, I used your prompt about splitting into two blocks, and looked at filter and query – Callum Linington Jun 23 '16 at 15:36
  • `wildcard` has gotten me closer, however, I'm not sure how to perform that on multiple fields – Callum Linington Jun 23 '16 at 15:37
  • You want to a search while typing kind of thing? – Anirudh Modi Jun 23 '16 at 15:40
  • Looking at more of the documentation, I need to look index-time analysing so I can do search while typing stuff, because that seems to be the route I'm heading down – Callum Linington Jun 23 '16 at 15:41
  • If yes then go through this question, it will give you a proper idea o how to do a search while typing... http://stackoverflow.com/questions/9421358/filename-search-with-elasticsearch Also never use `phrase_prefix`, index your document properly by using right tokenizer – Anirudh Modi Jun 23 '16 at 15:42
  • Fuzziness is used to avoid spelling mistakes, fuzziness can't be used to perform a search while typing issue, a quick solution can be `phrase_prefix`, but it makes the searching very inefficient and inaccurate. check this link for issue with `phrase_prefix` : https://discuss.elastic.co/t/how-do-i-get-elasticsearch-max-clause-count-to-take-effect/6133 So that is why the using the right analyzer, `edgeNGram` or `nGram` filter/tokenizer during indexing your document, is the most effiecient way – Anirudh Modi Jun 23 '16 at 15:46
  • @CallumLinington any proper lead to what you want to achieve? – Anirudh Modi Jun 23 '16 at 15:57
  • Yeah I'm just setting up my indexer, not sure what to do about the answer to my question – Callum Linington Jun 23 '16 at 15:58
  • You asked about how to use `multi_match`, I would have given a proper answer had it mentioned, about search while typing and other things. :) Anyways, good luck. – Anirudh Modi Jun 23 '16 at 15:59
  • I've managed to get to the exact solution I'm looking for, you can read my answer if it is of any interest – Callum Linington Jun 24 '16 at 09:13
  • I guess, i miss understood the question to a little extent... Which explains, were the links helpful? In understanding elastic better? – Anirudh Modi Jun 24 '16 at 11:14
  • They did, but I had already found them before you posted them – Callum Linington Jun 24 '16 at 11:22
  • I get a value error "Q() can only accept dict with a single query ({"match": {...}})." – NMO May 08 '19 at 16:22