2

i'm new to Elasticsearch and i've got a problem regarding querying.

I indexed strings like that:

my-super-string
my-other-string
my-little-string

This strings are slugs. So, they are no spaces, only alphanumeric characters. Mapping for the related field is only "type=string".

I'm using a query like this:

{ "query":{ "query_string":{ "query": "*"+<MY_QUERY>+"*", "rewrite": "top_terms_10" } }}

Where "MY_QUERY" is also a slug. Something like "my-super" for example.

When searching for "my" i get results.

When searching for "my-super" i get no results and i'd like to have "my-super-string".

Can someone help me on this? Thanks!

Vinc
  • 23
  • 3

1 Answers1

1

I would suggest using match_phrase instead of using query string with leading and trailing wildcards. Even standard analyzer should be able to split slug into tokens correctly, so there is not need for wildcards.

curl -XPUT "localhost:9200/slugs/doc/1" -d '{"slug": "my-super-string"}'
echo
curl -XPUT "localhost:9200/slugs/doc/2" -d '{"slug": "my-other-string"}'
echo
curl -XPUT "localhost:9200/slugs/doc/3" -d '{"slug": "my-little-string"}'
echo
curl -XPOST "localhost:9200/slugs/_refresh"
echo
echo "Searching for my"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "my"} } }'
echo
echo "Searching for my-super"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "my-super"} } }'
echo
echo "Searching for my-other"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "my-other"} } }'
echo
echo "Searching for string"
curl "localhost:9200/slugs/doc/_search?pretty=true&fields=slug" -d '{"query" : { "match_phrase": {"slug": "string"} } }'

Alternatively, you can create your own analyzer that will split slugs into tokens only on "-"

curl -XDELETE localhost:9200/slugs
curl -XPUT localhost:9200/slugs -d '{
    "settings": {
        "index": {
            "number_of_shards": 1,
            "number_of_replicas": 0,
            "analysis": {
                "analyzer" : {
                    "slug_analyzer" : {
                        "tokenizer": "slug_tokenizer",
                        "filter" : ["lowercase"]
                    }
                },
                "tokenizer" :{
                    "slug_tokenizer" : {
                        "type": "pattern",
                        "pattern": "-"
                    }
                }
            }
        }
    },
    "mappings" :{
        "doc" : {
            "properties" : {
                "slug" : {"type": "string", "analyzer" : "slug_analyzer"}
            }
        }
    }
}'
imotov
  • 28,277
  • 3
  • 90
  • 82
  • With the match_phrase i must have an exact match to have results. So i tried the match_phrase_prefix, works well, but i also need a "match_phrase_suffix" if i search "super-s" i'd like to get "my-super-string". In fact, i'd like to have a simple wildcard like `*-str*` that will match any slug containing "-str" It's the "-" character i have problem with. Anytime i had one to my query i get no result. – Vinc Nov 16 '12 at 09:46
  • Oh, I see. Then it's this: http://stackoverflow.com/questions/6467067/how-to-search-for-a-part-of-a-word-with-elasticsearch/6471449#6471449 – imotov Nov 16 '12 at 10:36
  • Thx you again... I'm near the final result, but i have still problems with the "-" character. I really want the exact matchs. I need something like a real exact wildcard (but you said in the other post to not use it ^^). I don't know how to index, and how to search. If i type "my-super" i want ALL docs containing "my-super". The "-" seems to break everything. – Vinc Nov 16 '12 at 23:12