13

The results that are being returned from haystack, using an elasticsearch backend seem erroneous to me. My search index is as follows:

from haystack import indexes
from .models import IosVideo

class VideoIndex(indexes.SearchIndex, indexes.Indexable):                   
    text = indexes.CharField(document=True, use_template=True)              
    title = indexes.CharField(model_attr='title')                           
    absolute_url = indexes.CharField(model_attr='get_absolute_url')         
#    content_auto = indexes.EdgeNgramField(model_attr='title')              
    description = indexes.CharField(model_attr='description')               
#    thumbnail = indexes.CharField(model_attr='thumbnail_url', null=True)   

    def get_model(self):                                                    
        return IosVideo                                                     

    def index_queryset(self, using=None):                                   
        return self.get_model().objects.filter(private=False)  

My text document looks like:

{{ object.title }}
{{ object.text }}
{{ object.description }}

My query is

SearchQuerySet().models(IosVideo).filter(content="darby")[0]

The result that's returning that makes me think this is not working is a video object with the following characteristics

title: u'Cindy Daniels'
description: u'',
text: u'Cindy Daniels\n\n\n',
absolute_url: u'/videos/testimonial/cindy-daniels/'

Why in the world would the query return such a result? I'm very confused.

My current theory is that it's tokenizing every subset of the char in the query and using that as partial match. Is there a way to decrease this tolerance to be a closer match.

My pip info is elasticsearch==1.2.0 django-haystack==2.3.1

And the elasticsearch version number is 1.3.1

Additionally when I hit the local server with http://localhost:9200/haystack/_search/?q=darby&pretty

It returns 10 results.

SearchQuerySet().filter(content="darby")  

Returns 4k results.

Does any one know what would cause this type of behavior?

user133688
  • 6,864
  • 3
  • 20
  • 36
  • Are you, by any chance, using elasticstack, or a custom analyzer? That could possibly explain the results that you're seeing. I've sure you saw, but the default lookup in filter as of Haystack 2.X is `contains`, rather than `exact`. That, plus an analyzer which looks at partial words, could potentially match that document. – Joey Wilhelm Apr 22 '15 at 23:08
  • No custom analyzer :( my pip looks like this elasticsearch==1.2, django-haystack==2.3.1. The elasticsearch version is 1.3.1 – user133688 Apr 22 '15 at 23:57
  • Have you tried directly querying elasticsearch to compare the results? For example `http://localhost:9200/_search/?q=darby` where `search` is your index name. – Lucas Moeskops Apr 25 '15 at 15:47
  • Did you inspect what the indexed documents in elasticsearch contain, in this case e.g. the document for Cindy Daniels? – sthzg Apr 26 '15 at 17:25
  • @LucasMoeskops http://localhost:9200/haystack/_search/?q=darby returns 10 results and none of those results are the Cindy Daniels object. So something is very amiss with Haystack then, correct? – user133688 Apr 27 '15 at 19:24

1 Answers1

4

There is a problem with the filter() method on Charfield indexes for django-haystack 2.1.0. You can change them to NgramField instead, for example text = indexes.NgramField(document=True, template_name=True).

The problem is that when you use this combination you get just the first character. So it returns you all the matches that has a 'd' in their text index field.

Ricardo Burillo
  • 1,246
  • 11
  • 14