7

I have a Django project that uses SOLR for indexing.

I'm trying to do a substring search using Haystack's SearchQuerySet class.

For example, when a user searches for the term "ear", it should return the entry that has a field with the value: "Search". As you can see, "ear" is a SUBSTRING of "Search". (obviously :))

In other words, in a perfect Django world I would like something like:

SearchQuerySet().all().filter(some_field__contains_substring='ear')

In the haystack documentation for SearchQuerySet (https://django-haystack.readthedocs.org/en/latest/searchqueryset_api.html#field-lookups), it says that only the following FIELD LOOKUP types are supported:

  • contains
  • exact
  • gt, gte, lt, lte
  • in
  • startswith
  • range

I tried using __contains, but it behaves exactly like __exact, which looks up the exact word (the whole word) in a sentence, not a substring of a word.

I am confused, because such a functionality is pretty basic, and I'm not sure if I'm missing something, or there is another way to approach this problem (using Regex or something?).

Thanks

Nahn
  • 3,196
  • 1
  • 24
  • 23

2 Answers2

6

That could be done using EdgeNgramField field:

some_field = indexes.EdgeNgramField() # also prepare value for this field or use model_attr

Then for partial match:

SearchQuerySet().all().filter(some_field='ear')
Aamir Rind
  • 38,793
  • 23
  • 126
  • 164
  • 3
    Thank you! Your answer is not 100% correct, but it lead me in the right direction. The solution was to use the **NgramField**, not the **EdgeNgramField**, like this: `some_field = indexes.NgramField(model_attr='some_field')`. The **EdgeNgramField** can only do _"starts with"_ and _"ends with"_ type of filtering. – Nahn Dec 19 '13 at 07:29
  • I wasn't workin with Solr but use **NgramField** worked for me with ElastiSearch. – Jonatas CD Sep 02 '14 at 20:22
  • 1
    EdgeNgram Field is not working like `__contains`, it works by stemming and find other matches based on the stems, hence it'll yield to a much more fuzzy result set than contains. – shredding Oct 21 '15 at 13:24
2

It's a bug in haystack.

As you said, __exact is implemented exactly like __contains and therefore this functionality does not exists out of the box in haystack.

The fix is awaiting merge here: https://github.com/django-haystack/django-haystack/issues/1041

You can bridge the waiting time for a fixed release like this:

from haystack.inputs import BaseInput, Clean


class CustomContain(BaseInput):
    """
    An input type for making wildcard matches.
    """
    input_type_name = 'custom_contain'

    def prepare(self, query_obj):
        query_string = super(CustomContain, self).prepare(query_obj)
        query_string = query_obj.clean(query_string)

        exact_bits = [Clean(bit).prepare(query_obj) for bit in query_string.split(' ') if bit]
        query_string = u' '.join(exact_bits)

        return u'*{}*'.format(query_string)

# Usage:
SearchQuerySet().filter(content=CustomContain('searchcontentgoeshere'))
shredding
  • 5,374
  • 3
  • 46
  • 77