0

I've managed to populate my index with 4 documents using this bulk request:

POST localhost:9200/titles/movies/_bulk

{"index":{"_id":"1"}}
{"id": "1","level": "first","titles": [{"value": "The Bad and the Beautiful","type": "Catalogue","main": true},{"value": "The Bad and the Beautiful (1945)","type": "International","main": false}]}
{"index":{"_id":"2"}}
{"id": "2","level": "first","titles": [{"value": "Bad Day at Black Rock","type": "Drama","main": true}]}
{"index":{"_id":"3"}}
{"id": "3","level": "second","titles": [{"value": "Baker's Wife","type": "AnotherType","main": true},{"value": "Baker's Wife (1940)","type": "Trasmitted","main": false}]}
{"index":{"_id":"4"}}
{"id": "4","level": "second","titles": [{"value": "Bambi","type": "Educational","main": true},{"value": "The Baby Deer and the hunter (1942)","type": "Fantasy","main": false}]}

Now how can I perform searches with wildcards on all available titles?

Something like localhost:9200/titles/movies/_search?q=*&sort=level:asc but providing one or more wilcards. For instance searching for "The % the %" and parsing the response from elasticsearch to eventually return something like:

{
    "count":2,
    "results":[{
        "id":"1",
        "level":"first",
        "foundInTitleTypes":["Catalogue","International"]
    },{
        "id":"4",
        "level":"second",
        "foundInTitleTypes":["Fantasy"]
    }]
}

Thanks!

Gabe
  • 5,997
  • 5
  • 46
  • 92
  • This is a starting point `localhost:9200/titles/movies/_search?q=titles.value:the.\*th‌​e.\*` but could have poor performances.. – Gabe Jan 30 '17 at 03:30
  • 1
    Do you really need this to work with URI search? The reason I'm asking is because you should probably be using the `nested` type for the `titles` field and searching nested values with URI search is not yet possible. – Val Jan 30 '17 at 05:11
  • My url in the comment above works. Body Search would be fine for me too though. How would I filter the response to only get `id, level and types`? Thanks – Gabe Jan 30 '17 at 09:53
  • This helps for filtering: http://stackoverflow.com/a/9605566/4317945 – Gabe Jan 30 '17 at 13:32
  • Yep, adding `"_source": ["id", "level", "titles.type"],` before "query" in the request body solved my problem. Thanks Zoltan ;) – Gabe Jan 30 '17 at 13:40

1 Answers1

2

Elasticsearch provides regex support in the the regular match query

GET titles/movies/_search
{
    "query": {
        "match" : { "titles.value" : "The * the *" }
    }
}

Gives you this

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.6406528,
    "hits": [
      {
        "_index": "titles",
        "_type": "movies",
        "_id": "4",
        "_score": 1.6406528,
        "_source": {
          "id": "4",
          "level": "second",
          "titles": [
            {
              "value": "Bambi",
              "type": "Educational",
              "main": true
            },
            {
              "value": "The Baby Deer and the hunter (1942)",
              "type": "Fantasy",
              "main": false
            }
          ]
        }
      },
      {
        "_index": "titles",
        "_type": "movies",
        "_id": "1",
        "_score": 0.9026783,
        "_source": {
          "id": "1",
          "level": "first",
          "titles": [
            {
              "value": "The Bad and the Beautiful",
              "type": "Catalogue",
              "main": true
            },
            {
              "value": "The Bad and the Beautiful (1945)",
              "type": "International",
              "main": false
            }
          ]
        }
      }
    ]
  }
}

To update to your question URI search, I'm not sure if it is possible, if you do it with curl you just omit the query dsl as data

curl localhost:9200/titles/movies/_search -d '{"query":{"match":{"titles.value":"The * the *"}}}'

{"took":46,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.6406528,"hits":[{"_index":"titles","_type":"movies","_id":"4","_score":1.6406528,"_source":{"id": "4","level": "second","titles": [{"value": "Bambi","type": "Educational","main": true},{"value": "The Baby Deer and the hunter (1942)","type": "Fantasy","main": false}]}},{"_index":"titles","_type":"movies","_id":"1","_score":0.9026783,"_source":{"id": "1","level": "first","titles": [{"value": "The Bad and the Beautiful","type": "Catalogue","main": true},{"value": "The Bad and the Beautiful (1945)","type": "International","main": false}]}}]}}

Update to latest question:

Well if you want to sort by level, you need to provide a mapping for elasticsearch. What I did:

Delete index

DELETE titles

Add mapping

PUT titles
{
  "settings": {
    "number_of_shards": 1
  }, 
  "mappings": {
    "movies": {
      "properties": {
        "level": {
          "type": "keyword"
        }
      }
    }
  }
}

Refine Query DSL

GET titles/movies/_search
{
  "_source": [
    "id",
    "level",
    "titles.value"
  ],
  "sort": [
    {
      "level": {
        "order": "asc"
      }
    }
  ],
  "query": {
    "match": {
      "titles.value": "The * the *"
    }
  }
}

That gives me

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": null,
    "hits": [
      {
        "_index": "titles",
        "_type": "movies",
        "_id": "1",
        "_score": null,
        "_source": {
          "level": "first",
          "id": "1",
          "titles": [
            {
              "value": "The Bad and the Beautiful"
            },
            {
              "value": "The Bad and the Beautiful (1945)"
            }
          ]
        },
        "sort": [
          "first"
        ]
      },
      {
        "_index": "titles",
        "_type": "movies",
        "_id": "4",
        "_score": null,
        "_source": {
          "level": "second",
          "id": "4",
          "titles": [
            {
              "value": "Bambi"
            },
            {
              "value": "The Baby Deer and the hunter (1942)"
            }
          ]
        },
        "sort": [
          "second"
        ]
      }
    ]
  }
}
cinhtau
  • 1,002
  • 8
  • 16
  • This works `localhost:9200/titles/movies/_search?q=titles.value:the.\*th‌​‌​‌​e.\*` but I'll use the Body Search approach as you suggested. Then how could I improve the query to only get **`id, level and types`** (to then re-structure the response into whatever I need)? Thanks – Gabe Jan 30 '17 at 12:16
  • 1
    Simply use `"_source":["id", "level", "types"]` in your query – Val Jan 30 '17 at 13:33
  • Awesome! Is the 4th entry correctly inserted for you ("**The** Baby Deer and **the** hunter")? It's not for me and no errors are returned. The count is always 3.. confused. – Gabe Jan 31 '17 at 03:02
  • mystery solved.. new line at the end of the bulk request body! :) – Gabe Jan 31 '17 at 03:08
  • 1
    Glad to hear that. Is your issue resolved than? Maybe close the question might be best. – cinhtau Jan 31 '17 at 08:35