8

I would like to ask for a help. I want to search for a words inside the Title and Content. Here is the structure

'body' => array(
  'mappings' => array(
    'myindex' => array(
      '_source' => array(
        'enabled' => true
      ),
      'properties' => array(
        'Title' => array(
          'type'  => 'string',
          'fields'=> array(
            'raw' => array(
               'type'  => 'string',
               'index' => 'not_analyzed'
              )
            )
          ),
          'Content' => array(
            'type'  => 'string'
          ),
          'Image' => array(
             type'      => 'string',
             'analyzer'  => 'standard'
         )
       )
     )
   )
 )

And the query string looks like this, where I want so search for "15-g" inside a text like "15-game":

"query" : {
  "query_string": {
    "query": "*15-g*",
    "fields": [ "Title", "Content" ]
  }
}

Please accept my apologize if I duplicate the question but I cannot find out what's going on and why it does not return any results.

I've already had a look at:

ElasticSearch - Searching with hyphens

ElasticSearch - Searching with hyphens in name

ElasticSearch - Searching with hyphens in name

But I can't make to work that with me.

What is really interesting is that if I search for "15 - g" (15space-spaceg) it returns the result.

Thank you so much in advance!

Daniel Widdis
  • 8,424
  • 13
  • 41
  • 63
Sensini
  • 117
  • 1
  • 2
  • 7

2 Answers2

4

Add a .raw field to your Content as well and make the search on the .raw fields:

{
  "query": {
    "query_string": {
      "query": "*15-g*",
      "fields": [
        "Title.raw",
        "Content.raw"
      ]
    }
  }
}

Anywhere you have a space in the text you want to search and you want that space to match your fields, it needs to be escaped (with \). Also, anytime you have upper case letter and wildcards and you want to match like that with the .raw fields you need to set lowercase_expanded_terms to false, because by default that setting is true and it will lowercase the search string (it will search for laptop - black):

{
  "query": {
    "query_string": {
      "query": "*Laptop\\ -\\ Black*",
      "lowercase_expanded_terms": false, 
      "fields": [
        "Title.raw",
        "Content.raw"
      ]
    }
  }
}
Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
  • Thank you @Andrei, it works! What about if I want to search two words or more ex. "Laptop - Black" lets say I have "Windows Laptop - Black" and I want to find it with "Laptop - Black". Thank you so much again. Regards. – Sensini Jul 09 '15 at 06:24
  • Anywhere you have a *space* in the text you want to search and you **want** to match your fields, it needs to be escaped. Also, anytime you have upper case letter and wildcards and you want to match like that with the `.raw` fields you need to set `lowercase_expanded_terms` to `false`, because it will lowercase the search string. Updated my response. – Andrei Stefan Jul 09 '15 at 07:36
  • Thank you so much @Andrei. I appreciate your effort. Everything works as you explained. :) – Sensini Jul 09 '15 at 08:33
  • Could you please give me an idea how I can do the search case insensitive? Regards – Sensini Jul 09 '15 at 13:32
2

In elasticsearch 5, you can define custom analyzer with filter setting. Here is the example codes:

PUT test1
{
  "settings" : {
    "analysis" : {
      "analyzer" : {
        "myAnalyzer" : {
          "type" : "custom",
          "tokenizer" : "whitespace",
          "filter" : [ "dont_split_on_numerics" ]
        }
      },
      "filter" : {
        "dont_split_on_numerics" : {
          "type" : "word_delimiter",
          "preserve_original": true,
          "generate_number_parts" : false
        }
      }
    }
  },
  "mappings": {
    "type_one": {
      "properties": {
        "title": { 
          "type": "text",
          "analyzer": "standard"
        }
      }
    },
    "type_two": {
      "properties": {
        "raw": { 
          "type": "text",
          "analyzer": "myAnalyzer"
        }
      }
    }
  }
}

please know that I set the

"preserve_original": true "generate_number_parts"

So that the string "2-345-6789" will keep as it is. Dash is reserved word in elasticsearch. Without the above setting, standard tokenizer will generate "2", "345", and "6789". So, now you can use "wildcard" search ie.

"5-67"

to get the result.

POST test1/type_two/1
{
  "raw": "2-345-6789"
}

GET test1/type_two/_search
{
  "query": {
    "wildcard": {
      "raw": "*5-67*"
    }
  }
}

The detail information can be found at elastic search tokenfilter

Yang Young
  • 602
  • 5
  • 6