2

Let's assume I have books with titles indexed with ElasticSearch as following:

curl -XPUT "http://localhost:9200/_river/books/_meta" -d'
{
"type": "jdbc",
"jdbc": {
"driver": "org.postgresql.Driver",
"url": "jdbc:postgresql://localhost:5432/...",
"user": "...",
"password": "...",
"index": "books",
"type": "books",
"sql": "SELECT * FROM books"}

}'

For instance, I have a book called "Afoo barb".

The following code (searching for '.*foo.*') returns well the book:

client.search({
  index: 'books',
  'from': 0,
  'size': 10,
  'body' : {
    'query': {
      'filtered': {
         'filter': {
           'bool': {
              'must': {
                'regexp': { title: '.*foo.*' }
               }
            }
          }
        }
     }
  }
});

But the following code (searching for '.*foo bar.*') does not:

client.search({
  index: 'books',
  'from': 0,
  'size': 10,
  'body' : {
    'query': {
      'filtered': {
         'filter': {
           'bool': {
              'must': {
                'regexp': { title: '.*foo bar.*' }
               }
            }
          }
        }
     }
  }
});

I tried to replace the space by '\s' or '.*' but it does not work either.

I think the title is separated in terms (['Afoo', 'barb']) so it can't find '.*foo bar.*'.

How can I ask Elasticsearch to search the regexp in the complete title ?

pidupuis
  • 343
  • 6
  • 15

1 Answers1

1

Elasticsearch will apply the regexp to the terms produced by the tokenizer for that field, and not to the original text of the field.

You can use different tokenizer for indexing your fields or define the regex in such a way that it returns required documents with high score.

Example with keyword tokenizer:

'regexp': { title: '*(foo bar)*' }
karthik manchala
  • 13,492
  • 1
  • 31
  • 55
  • `'.*(foo|bar).*'` does not work since it does an union between the result of `'.*foo.*'` and `'.*bar.*'`. I rather want an intersection because I don't want the title `'Foo baz'`... – pidupuis May 22 '15 at 08:25
  • What tokenizer should I use to search against the entire original text ? – pidupuis May 22 '15 at 08:29
  • @pidupuis you can use [keyword tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-keyword-tokenizer.html) for your purpose.. – karthik manchala May 22 '15 at 08:31
  • Thanks but I don't manage to use it... According to things I read on forums, I tried to add it as a settings when putting the index or directly use it as param in a `query_string` but it does not work and I don't find any explicit documentation... – pidupuis May 22 '15 at 08:54
  • @pidupuis have you read [this](http://stackoverflow.com/questions/15079064/how-to-setup-a-tokenizer-in-elasticsearch)? – karthik manchala May 22 '15 at 09:15
  • How can I combine this with the river syntax ? – pidupuis May 22 '15 at 11:59
  • What do you mean by river syntax?.. Also.. rivers are [deprecated](https://www.elastic.co/blog/deprecating_rivers) in 1.5.0. – karthik manchala May 22 '15 at 12:50
  • Ok so I am not using rivers anymore. I create my index through javascript using the `keyword` tokenizer. `'filter': {bool: {must: { regexp: { title: '.*foo bar.*' }}}}` does not work but `'query' : { 'query_string' : { 'fields' : ['title'], 'query' : '*foo bar*'}}` does so it's good :) – pidupuis May 26 '15 at 11:03