multiple like query in elastic search

Question

I have a field path in my elastic-search documents which has entries like this

/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_011007/stderr
/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_008874/stderr

#*Note -- I want to select all the documents having below line in the **path** field
/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_009257/stderr

I want to make a like query on this path field given certain things(basically an AND condition on all the 3):-

I have given application number 1451299305289_0120
I have also given a task number 009257
The path field should also contain stderr

Given the above criteria the document having the path field as the 3rd line should be selected

This is what I have tries so far

http://localhost:9200/logstash-*/_search?q=application_1451299305289_0120 AND path:stderr&size=50

This query fulfills the 3rd criteria, and partially the 1st criteria i.e if I search for 1451299305289_0120 instead of application_1451299305289_0120, I got 0 results. (What I really need is like search on 1451299305289_0120)

When I tried this

http://10.30.145.160:9200/logstash-*/_search?q=path:*_1451299305289_0120*008779 AND path:stderr&size=50

I got the result, but using * at the start is a costly operation. Is their another way to achieve this effectively (like using nGram and using fuzzy-search of elastic-search)

Using nGram will very costly however what you can do edgeNGram use a couple of filters while analyzing.. I suggest you can look into this article.. http://stackoverflow.com/questions/9421358/filename-search-with-elasticsearch# It may be of little help, as in you can get some direction.. — Anirudh Modi, Dec 30 '15 at 11:31

score 1 · Answer 1 · answered Dec 30 '15 at 15:47

This can be achieved by using Pattern Replace Char Filter. You just extract only important bits of information with regex. This is my setup

POST log_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "app_analyzer": {
          "char_filter": [
            "app_extractor"
          ],
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        },
        "path_analyzer": {
          "char_filter": [
            "path_extractor"
          ],
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        },
        "task_analyzer": {
          "char_filter": [
            "task_extractor"
          ],
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "char_filter": {
        "app_extractor": {
          "type": "pattern_replace",
          "pattern": ".*application_(.*)/container.*",
          "replacement": "$1"
        },
        "path_extractor": {
          "type": "pattern_replace",
          "pattern": ".*/(.*)",
          "replacement": "$1"
        },
        "task_extractor": {
          "type": "pattern_replace",
          "pattern": ".*container.{27}(.*)/.*",
          "replacement": "$1"
        }
      }
    }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "keyword",
          "fields": {
            "application_number": {
              "type": "string",
              "analyzer": "app_analyzer"
            },
            "path": {
              "type": "string",
              "analyzer": "path_analyzer"
            },
            "task": {
              "type": "string",
              "analyzer": "task_analyzer"
            }
          }
        }
      }
    }
  }
}

I am extracting application number, task number and path with regex. You might want to optimize task regex a bit if you have some other log pattern, then we can use Filters to search.A big advantage of using filters is that they are cached and make subsequent calls faster.

I indexed sample log like this

PUT log_index/your_type/1
{
  "name" : "/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_009257/stderr"
}

This query will give you desired results

GET log_index/_search
{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "name.application_number": "1451299305289_0120"
              }
            },
            {
              "term": {
                "name.task": "009257"
              }
            },
            {
              "term": {
                "name.path": "stderr"
              }
            }
          ]
        }
      }
    }
  }
}

On a side note filtered query is deprecated in ES 2.x, just use filter directly.Also path hierarchy might be useful for some other uses

Hope this helps :)

multiple like query in elastic search

1 Answers1