7

This isn't really a complex problem (to my knowledge).

I know in MongoDB you can feed in a string and it automatically tokenizes and performs full-text search using that string as a query.

However, in Django, I have yet to find similar functionality, and all of the examples I've seen have done something along the lines of:

from django.contrib.postgres.search import SearchQuery
query = SearchQuery('foo')

Is the reason people only use one word because SearchQuery can only use one word?

What I want to know is how to perform full-text search with multiple words. Is it as easy as doing

from django.contrib.postgres.search import SearchQuery
query = SearchQuery('foo and also bar')

? Or does it need to be more complicated than that?

Quontas
  • 400
  • 1
  • 3
  • 19

3 Answers3

3

To perform Full Text Search with Django you have to combine using GiN index with SearchVector.
Here is full working example what I used somewhere. It works also on 2+ words in a query and searches them in 3 fields.

Chiefir
  • 2,561
  • 1
  • 27
  • 46
  • The issue I encounter when searching is that the querying does not function as intended when I have a vector of "foo and bar" and a query of "foo but baz". In my mind, the document containing the vector should be found when searching on the query, but it does not come up. – Quontas Jun 17 '18 at 17:55
  • No, it should not, your vector does not contain word 'but'. – Chiefir Jun 17 '18 at 18:00
  • I see. Is there any way to have my querying behave so that the results are ranked based on the number of words that match in both the vector and the query, even if the query has extraneous words in it? – Quontas Jun 17 '18 at 18:03
  • I think with pure Django no. At least I did not manage to fix that. May be have a look on search engines, like Haystack + Whoosh (for beginning) or Solr, or ElasticSearch (far more complicated). – Chiefir Jun 17 '18 at 18:16
  • Yep. But if you some day find a way how to solve this problem - let me know, I have the same problem in 1 of my projects :) – Chiefir Jun 17 '18 at 18:43
  • Something that just occurred to me would be splitting the string up into individual terms, creating a searchquery for each of those, and then ORing those together – Quontas Jun 17 '18 at 18:45
  • Yep, I thought about something similar, but that looks so low quality... Decided not doing that. – Chiefir Jun 17 '18 at 18:52
  • Also with that you will get a lot of items which will have only 1 word from 3, for example, but will appear on your query. Some kind of "white noise" which you could not get rid. – Chiefir Jun 17 '18 at 19:03
  • For the purposes of what I'm doing (this specific use of full text search is not user facing, it's used to link tables together), I only need the first one so that might work for me. – Quontas Jun 17 '18 at 19:04
1

By default, SearchQuery uses search_type="plain", which when passed multiple words finds results that contain all the words anywhere in the result. This is the same as using the search_type="raw" with the & (and) operator. In summary, all these queries do the same:

SearchQuery("foo and also bar")
SearchQuery("foo and also bar", search_type="plain")
SearchQuery("foo & and & also & bar", search_type="raw")

Note that search_type="raw" is picky about its syntax, so you might want to remove special characters or put words between quotes, especially when taken from user input:

SearchQuery("'foo' & 'and' & 'also' & 'bar'", search_type="raw")

If you want to find all results that contain at least one of the words, use the | (or) operator:

SearchQuery("foo | and | also | bar", search_type="raw")

If you combine this with a SearchRank, results with multiple words get a higher rank than results with only a single word.

To find the words as a phrase, use the <-> (followed by) operator:

SearchQuery("foo <-> and <-> also <-> bar", search_type="raw")

This will find results containing the phrase "foo and also bar", but not for example "foo and bar".

In the background, the search_type="raw" parser uses PostgreSQL's to_tsquery() full text search query parser, while search_type="plain" uses plain_to_tsquery().

roskakori
  • 3,139
  • 1
  • 30
  • 29
  • «If you combine this with a SearchRank, results with multiple words get a higher rank than results with only a single word» How would you do that? – Luiz Dec 15 '22 at 03:33
  • @Luiz Take a look at the first code example given by the [SearchRank documentation](https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/search/#searchrank). It uses `annotate`to add a computed field with the rank, and then `order_by` to show the best matches first. – roskakori Dec 16 '22 at 10:15
0

You can use more than one word in SearchQuery. You can even get rid of SearchQuery if you don't need to search by logical combination of terms:

Entry.objects.annotate(
     search=SearchVector('body_text'),
).filter(search="Multiple words query")
Daniil Ryzhkov
  • 7,416
  • 2
  • 41
  • 58
  • 1
    That's what I originally thought, but for some reason the terms that I include in a query are not being used. For example, if I have a document with a field whose value is "Blah building, college", and use a query of "Blah building", there is no match. – Quontas Jun 17 '18 at 03:54
  • @Quontas try to do the same directly in postgresql https://www.postgresql.org/docs/9.5/static/textsearch.html – Daniil Ryzhkov Jun 17 '18 at 04:00