822

I'm trying to build the search for a Django site I am building, and in that search, I am searching across three different models. And to get pagination on the search result list, I would like to use a generic object_list view to display the results. But to do that, I have to merge three QuerySets into one.

How can I do that? I've tried this:

result_list = []
page_list = Page.objects.filter(
    Q(title__icontains=cleaned_search_term) |
    Q(body__icontains=cleaned_search_term))
article_list = Article.objects.filter(
    Q(title__icontains=cleaned_search_term) |
    Q(body__icontains=cleaned_search_term) |
    Q(tags__icontains=cleaned_search_term))
post_list = Post.objects.filter(
    Q(title__icontains=cleaned_search_term) |
    Q(body__icontains=cleaned_search_term) |
    Q(tags__icontains=cleaned_search_term))

for x in page_list:
    result_list.append(x)
for x in article_list:
    result_list.append(x)
for x in post_list:
    result_list.append(x)

return object_list(
    request,
    queryset=result_list,
    template_object_name='result',
    paginate_by=10,
    extra_context={
        'search_term': search_term},
    template_name="search/result_list.html")

But this doesn't work. I get an error when I try to use that list in the generic view. The list is missing the clone attribute.

How can I merge the three lists, page_list, article_list and post_list?

Dave Mackey
  • 4,306
  • 21
  • 78
  • 136
espenhogbakk
  • 11,508
  • 9
  • 39
  • 38
  • 1
    Django users 1.11 and abv, see this answer - https://stackoverflow.com/a/42186970/6003362 – Sahil Agarwal Feb 22 '18 at 09:24
  • 1
    For searching it's better to use dedicated solutions like [Haystack](http://haystacksearch.org/) - it's very flexible. – minder Apr 09 '10 at 08:52
  • 1
    Looks like t_rybik has created a comprehensive solution at http://www.djangosnippets.org/snippets/1933/ – akaihola Mar 21 '10 at 18:17
  • *note*: the question is limited to the very rare case when after merging 3 different models together you don't need extracting models again on the listing to distinguish data on types. For most cases - if distinction is expected - it will wrong interface. For the same models: see answers about `union`. – Sławomir Lenart Jul 15 '19 at 17:18

16 Answers16

1231

Concatenating the querysets into a list is the simplest approach. If the database will be hit for all querysets anyway (e.g. because the result needs to be sorted), this won't add further cost.

from itertools import chain
result_list = list(chain(page_list, article_list, post_list))

Using itertools.chain is faster than looping each list and appending elements one by one, since itertools is implemented in C. It also consumes less memory than converting each queryset into a list before concatenating.

Now it's possible to sort the resulting list e.g. by date (as requested in hasen j's comment to another answer). The sorted() function conveniently accepts a generator and returns a list:

from operator import attrgetter
result_list = sorted(
    chain(page_list, article_list, post_list),
    key=attrgetter('date_created')
)

You can reverse the sort order:

result_list = sorted(
    chain(page_list, article_list, post_list),
    key=attrgetter('date_created'),
    reverse=True,
)

attrgetter is equivalet to the following lambda (this was the way it had to be done before Python 2.4):

result_list = sorted(
    chain(page_list, article_list, post_list),
    key=lambda instance: instance.date_created,
)
akaihola
  • 26,309
  • 7
  • 59
  • 69
  • 20
    If merging querysets from the same table to perform an OR query, and have duplicated rows you can eliminate them with the groupby function: `from itertools import groupby` `unique_results = [rows.next() for (key, rows) in groupby(result_list, key=lambda obj: obj.id)]` – Josh Russo Sep 18 '11 at 22:21
  • 2
    Ok, so nm about the groupby function in this context. With the the Q function you should be able to perform any OR query you need: [https://docs.djangoproject.com/en/1.3/topics/db/queries/#complex-lookups-with-q-objects](https://docs.djangoproject.com/en/1.3/topics/db/queries/#complex-lookups-with-q-objects) – Josh Russo Sep 18 '11 at 22:41
  • is it possible to use `list.extend` instead of `chain`? – apelliciari Aug 13 '12 at 10:04
  • What happens if there are duplicates? Will it remove them or will I need to add some extra code to do so? – JDavies Dec 04 '12 at 11:40
  • I have two lists like this x = [{ "id":"value", "name":"value", "date":"value"},..,..,...] and the other list also like this. I am trying to arrange it according to date. Will this work? – Saransh Mohapatra May 22 '13 at 14:50
  • I had to sort by a field whose type was the same across 3 different data models, but were named differently. I worked around this by writing a model method within each model to normalize access... Article.title, Event.name and News.topic all got a ``get_object_name`` method. Worked like a charm! (and also works for ImageFields) – Ben Keating Feb 11 '15 at 20:17
  • 3
    @apelliciari Chain uses significantly less memory than list.extend, because it doesn't need to load both lists fully into memory. – Dan Gayle Apr 15 '15 at 17:17
  • @JoshRusso Your link above is dead – AWrightIV Sep 10 '15 at 22:01
  • 2
    @AWrightIV Here's the new version of that link: https://docs.djangoproject.com/en/1.8/topics/db/queries/#complex-lookups-with-q-objects – Josh Russo Sep 12 '15 at 16:27
  • 1
    trying this approacg but have `'list' object has no attribute 'complex_filter'` – grillazz Oct 05 '16 at 10:30
  • Since you're fetching a lot of model instances, but only require a few fields for each, I do suggest adding [only](https://docs.djangoproject.com/en/1.10/ref/models/querysets/#only) to the queryset for optimization. – WhyNotHugo Feb 06 '17 at 00:27
  • @akaihola so I'm still uncertain here about if the sort() call is done here as above, is it going to load the entire queryset into memory to perform the sort? – Purrell Sep 16 '17 at 17:09
  • @Purrell do you mean the `sorted()` call? That's going to evaluate any iterator you hand to it for sure. So yes, the queryset is going to be in memory. – akaihola Sep 16 '17 at 19:26
  • 1
    @akaihola Yeah figured. So... I wonder how to combine querysets for pagination, with order by date, in a way that actually scales ... Seems like a Paginator feature request. Or perhaps your QuerySetChain answer https://djangosnippets.org/snippets/1933/ – Purrell Sep 17 '17 at 13:04
  • Well @Purrell if you think about it, it's not possible to do pagination on the Django side optimally. Even if each of query results is sorted and you merge them intelligently, you can't know the optimal offsets and limits to query for a given page. You can make a good guess, but you'll always end up either querying lots of extra rows, or having to do extra queries, or both. Dropping down to raw SQL and chaining+sorting on the database side will give you better performance. – akaihola Sep 17 '17 at 13:44
  • 1
    Thanks, this was massively helpful. As a note, this returns a list containing model class objects that can be operated on. Initially I thought it returned strings! – MarMat Jan 30 '19 at 14:19
  • 1
    This answer made me start to be active in SO, editing and answering questions, in order to get enough reputation so i could upvote it. – gdef_ Feb 20 '19 at 20:25
  • @JoshRusso I got an AttributeError: 'itertools._grouper' object has no attribute 'next' – gabn88 May 11 '21 at 15:04
  • How can `sorted` with `key=attrgetter('date_created')` in reverse order? – parmer_110 Mar 27 '23 at 07:48
  • 1
    @parmer_110, the `sorted()` function supports a `reversed=` keyword argument. See [documentation for sorted()](https://docs.python.org/3/library/functions.html#sorted). – akaihola Apr 14 '23 at 11:48
578

Try this:

matches = pages | articles | posts

It retains all the functions of the querysets which is nice if you want to order_by or similar.

Please note: this doesn't work on querysets from two different models.

Daniel Holmes
  • 1,952
  • 2
  • 17
  • 28
  • 14
    Doesn't work on sliced querysets, though. Or am I missing something? – sthzg Apr 20 '14 at 22:30
  • | seems to be more like list + than set |, i.e., it can create duplicates. – user2183078 Sep 01 '14 at 01:02
  • 2
    I used to join the querysets using "|" but not always works fine. It's better to use "Q": https://docs.djangoproject.com/en/dev/topics/db/queries/#complex-lookups-with-q – Ignacio Pérez Sep 12 '14 at 08:59
  • 1
    It does not seem to create duplicates, using Django 1.6. – Teekin Nov 01 '14 at 14:55
  • 25
    Here `|` is the set union operator, not bitwise OR. – e100 Mar 26 '15 at 18:53
  • 15
    @e100 no, it's not the set union operator. django overloads the bitwise OR operator: https://github.com/django/django/blob/master/django/db/models/query.py#L308 – shangxiao Aug 24 '16 at 15:31
  • 1
    This doesn't guarantee that the order will be preserved, correct? – Flimm Dec 23 '16 at 14:09
  • @IgnacioPérez Django provides an OR on both QuerySet and Q so depending on the scenario either may be appropriate. – Jmills Jan 15 '17 at 23:39
  • @sthzg It is because you need to be conscious of whether your slicing operation is returning a [QuerySet or a list](https://docs.djangoproject.com/en/1.10/ref/models/querysets/#when-querysets-are-evaluated) - if it is a QuerySet then the OR operation is implemented but this is not the case with a list. – Jmills Jan 15 '17 at 23:43
  • 1
    `Cannot combine queries once a slice has been taken.` So, not useful if queryset is sliced. – Rishabh Agrahari Sep 23 '17 at 17:33
  • 7
    Note that this solution does *not* preserve ordering, so a set `{x,y,x}` and a set `{a,b,c}` may end up `{a,b,c,x,y,z}` regardless of whether you use `s1 | s2` or `s2 | s1` and that makes `|` somewhat useless in many cases. – Mike 'Pomax' Kamermans Oct 25 '17 at 01:07
  • 1
    Does not work for objects containing geodjango geometries. – MarMat Jan 30 '19 at 14:16
  • i used over my first project over django and it's beauty of django. i really love it. – Devang Hingu Feb 03 '20 at 12:01
166

Related, for mixing querysets from the same model, or for similar fields from a few models, starting with Django 1.11 a QuerySet.union() method is also available:

union()

union(*other_qs, all=False)

New in Django 1.11. Uses SQL’s UNION operator to combine the results of two or more QuerySets. For example:

>>> qs1.union(qs2, qs3)

The UNION operator selects only distinct values by default. To allow duplicate values, use the all=True argument.

union(), intersection(), and difference() return model instances of the type of the first QuerySet even if the arguments are QuerySets of other models. Passing different models works as long as the SELECT list is the same in all QuerySets (at least the types, the names don’t matter as long as the types in the same order).

In addition, only LIMIT, OFFSET, and ORDER BY (i.e. slicing and order_by()) are allowed on the resulting QuerySet. Further, databases place restrictions on what operations are allowed in the combined queries. For example, most databases don’t allow LIMIT or OFFSET in the combined queries.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Udi
  • 29,222
  • 9
  • 96
  • 129
83

You can use the QuerySetChain class below. When using it with Django's paginator, it should only hit the database with COUNT(*) queries for all querysets and SELECT() queries only for those querysets whose records are displayed on the current page.

Note that you need to specify template_name= if using a QuerySetChain with generic views, even if the chained querysets all use the same model.

from itertools import islice, chain

class QuerySetChain(object):
    """
    Chains multiple subquerysets (possibly of different models) and behaves as
    one queryset.  Supports minimal methods needed for use with
    django.core.paginator.
    """

    def __init__(self, *subquerysets):
        self.querysets = subquerysets

    def count(self):
        """
        Performs a .count() for all subquerysets and returns the number of
        records as an integer.
        """
        return sum(qs.count() for qs in self.querysets)

    def _clone(self):
        "Returns a clone of this queryset chain"
        return self.__class__(*self.querysets)

    def _all(self):
        "Iterates records in all subquerysets"
        return chain(*self.querysets)

    def __getitem__(self, ndx):
        """
        Retrieves an item or slice from the chained set of results from all
        subquerysets.
        """
        if type(ndx) is slice:
            return list(islice(self._all(), ndx.start, ndx.stop, ndx.step or 1))
        else:
            return islice(self._all(), ndx, ndx+1).next()

In your example, the usage would be:

pages = Page.objects.filter(Q(title__icontains=cleaned_search_term) |
                            Q(body__icontains=cleaned_search_term))
articles = Article.objects.filter(Q(title__icontains=cleaned_search_term) |
                                  Q(body__icontains=cleaned_search_term) |
                                  Q(tags__icontains=cleaned_search_term))
posts = Post.objects.filter(Q(title__icontains=cleaned_search_term) |
                            Q(body__icontains=cleaned_search_term) | 
                            Q(tags__icontains=cleaned_search_term))
matches = QuerySetChain(pages, articles, posts)

Then use matches with the paginator like you used result_list in your example.

The itertools module was introduced in Python 2.3, so it should be available in all Python versions Django runs on.

akaihola
  • 26,309
  • 7
  • 59
  • 69
  • 7
    Nice approach, but one problem I see here is that the query sets are appended "head-to-tail". What if each queryset is ordered by date and one needs the combined-set to also be ordered by date? – hasen Jan 11 '09 at 10:02
  • This certaintly looks promising, great, I'll have to try that, but i dont have time today. I'll get back to you if it solves my problem. Great work. – espenhogbakk Jan 11 '09 at 11:04
  • Ok, I had to try today, but it didnt work, first it complained that it didnt have to _clone attribute so i added that one, just copied the _all and that worked, but it seems that the paginator has some problem with this queryset. I get this paginator error: "len() of unsized object" – espenhogbakk Jan 11 '09 at 12:48
  • @Espen It seems that since revision 8121 of Django the paginator always first tries to call the count() method. If your Django is older than that, try renaming count() to __len__(). – akaihola Jan 12 '09 at 07:28
  • @hasen If sorting is needed, a simple list concatenation works best (if you don't want to do tricks with the database schema). See the "Concatenating the querysets into a list..." answer. – akaihola Jan 12 '09 at 08:02
  • @akaihola Hmmm, still having problem with this, I've tried to add a __len__() and a len() function, but it still complains about "len() of unsized object". You can se the problem in action here: http://drommestipendet.hyperinteraktiv.no/sok/?q=lorem&content-type=text – espenhogbakk Jan 12 '09 at 09:31
  • @akaihola Sorry, the right url is: http://drommestipendet.hyperinteraktiv.no/sok/?q=lorem&content-type=text – espenhogbakk Jan 12 '09 at 09:31
  • @Espen I added a proper _clone() method to the class. It should now work with the object_list generic view, if template_name= is specified. – akaihola Jan 13 '09 at 10:00
  • @akaihola Hey, now its getting there, but not yet. Iam really sorry for the hazzle. It without the pagination. But with it i get an "integer required error" in the __get_item__ function. You can see the error here: http://drommestipendet.hyperinteraktiv.no/sok/?q=lorem&content-type=text Sorry... – espenhogbakk Jan 13 '09 at 12:26
  • @Espen We hit a difference between Python 2.4 and 2.5. In 2.5 the islice function accepts a step value of None and interprets it as 1. The fix is trivial. By the way, I sense that learning Python debugging techniques plus some Django specific tricks would benefit you a lot. Amazing stuff out there. – akaihola Jan 14 '09 at 12:56
  • @akaihola Thanks! You've been most helpful! And yes, i should learn more, definitly, ive just started using Django and Python so there is a lot to learn, but im getting there slowly. Any tips on where to get good resources on debugging? Thanks again! – espenhogbakk Jan 14 '09 at 19:58
  • 1
    @Espen Python library: pdb, logging. External: IPython, ipdb, django-logging, django-debug-toolbar, django-command-extensions, werkzeug. Use print statements in code or use the logging module. Above all, learn to introspect in the shell. Google for blog posts about debugging Django. Glad to help! – akaihola Jan 14 '09 at 21:19
  • 4
    @patrick see http://djangosnippets.org/snippets/1103/ and http://djangosnippets.org/snippets/1933/ – epecially the latter is a very comprehensive solution – akaihola Apr 10 '11 at 11:16
43

In case you want to chain a lot of querysets, try this:

from itertools import chain
result = list(chain(*docs))

where: docs is a list of querysets

vutran
  • 2,145
  • 21
  • 34
32

The big downside of your current approach is its inefficiency with large search result sets, as you have to pull down the entire result set from the database each time, even though you only intend to display one page of results.

In order to only pull down the objects you actually need from the database, you have to use pagination on a QuerySet, not a list. If you do this, Django actually slices the QuerySet before the query is executed, so the SQL query will use OFFSET and LIMIT to only get the records you will actually display. But you can't do this unless you can cram your search into a single query somehow.

Given that all three of your models have title and body fields, why not use model inheritance? Just have all three models inherit from a common ancestor that has title and body, and perform the search as a single query on the ancestor model.

Carl Meyer
  • 122,012
  • 20
  • 106
  • 116
30

This can be achieved by two ways either.

1st way to do this

Use union operator for QuerySet | to take union of two QuerySets. If both QuerySets belongs to the same model / a single model then it is possible to combine QuerySets by using the union operator.

For an instance

pagelist1 = Page.objects.filter(
    Q(title__icontains=cleaned_search_term) | 
    Q(body__icontains=cleaned_search_term))
pagelist2 = Page.objects.filter(
    Q(title__icontains=cleaned_search_term) | 
    Q(body__icontains=cleaned_search_term))
combined_list = pagelist1 | pagelist2 # this would take union of two querysets

2nd way to do this

One other way to achieve combine operations between two QuerySets is to use the itertools chain function.

from itertools import chain
combined_results = list(chain(pagelist1, pagelist2))
Dave Mackey
  • 4,306
  • 21
  • 78
  • 136
Devang Padhiyar
  • 3,427
  • 2
  • 22
  • 42
  • 3
    Instead of `itertools.chain` (which runs each query separately), `functools.reduce(operator.or_, [pagelist1, pagelist2])` can be used to programmatically apply your first approach. This results in a single query. – Cornflex Aug 19 '21 at 17:18
21

You can use Union:

qs = qs1.union(qs2, qs3)

But if you want to apply order_by on the foreign models of the combined queryset... then you need to Select them beforehand this way... otherwise it won't work.

Example

qs = qs1.union(qs2.select_related("foreignModel"), qs3.select_related("foreignModel"))
qs.order_by("foreignModel__prop1")

where prop1 is a property in the foreign model.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Vignesh Sk
  • 443
  • 5
  • 9
16
DATE_FIELD_MAPPING = {
    Model1: 'date',
    Model2: 'pubdate',
}

def my_key_func(obj):
    return getattr(obj, DATE_FIELD_MAPPING[type(obj)])

And then sorted(chain(Model1.objects.all(), Model2.objects.all()), key=my_key_func)

Quoted from https://groups.google.com/forum/#!topic/django-users/6wUNuJa4jVw. See Alex Gaynor

ray6080
  • 873
  • 2
  • 10
  • 25
9

Requirements: Django==2.0.2, django-querysetsequence==0.8

In case you want to combine querysets and still come out with a QuerySet, you might want to check out django-queryset-sequence.

But one note about it. It only takes two querysets as it's argument. But with python reduce you can always apply it to multiple querysets.

from functools import reduce
from queryset_sequence import QuerySetSequence

combined_queryset = reduce(QuerySetSequence, list_of_queryset)

And that's it. Below is a situation I ran into and how I employed list comprehension, reduce and django-queryset-sequence

from functools import reduce
from django.shortcuts import render    
from queryset_sequence import QuerySetSequence

class People(models.Model):
    user = models.OneToOneField(User, on_delete=models.CASCADE)
    mentor = models.ForeignKey('self', null=True, on_delete=models.SET_NULL, related_name='my_mentees')

class Book(models.Model):
    name = models.CharField(max_length=20)
    owner = models.ForeignKey(Student, on_delete=models.CASCADE)

# as a mentor, I want to see all the books owned by all my mentees in one view.
def mentee_books(request):
    template = "my_mentee_books.html"
    mentor = People.objects.get(user=request.user)
    my_mentees = mentor.my_mentees.all() # returns QuerySet of all my mentees
    mentee_books = reduce(QuerySetSequence, [each.book_set.all() for each in my_mentees])

    return render(request, template, {'mentee_books' : mentee_books})
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
chidimo
  • 2,684
  • 3
  • 32
  • 47
  • 2
    Does `Book.objects.filter(owner__mentor=mentor)` not do the same thing? I'm not sure this is a valid use-case. I think a `Book` might need to have multiple `owner`s before you needed to start doing anything like this. – Will S Mar 06 '18 at 09:50
  • 1
    Yeah it does the same thing. I tried it. Anyway, perhaps this could be useful in some other situation. Thanks for pointing that out. You don't exactly start out knowing all the shortcuts as a beginner. Sometimes you gotta travel the load winding road to appreciate the crow's fly – chidimo Apr 25 '18 at 17:06
6

Here's an idea... just pull down one full page of results from each of the three and then throw out the 20 least useful ones... this eliminates the large querysets and that way you only sacrifice a little performance instead of a lot.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jiaaro
  • 74,485
  • 42
  • 169
  • 190
6

The best option is to use the Django built-in methods:

# Union method
result_list = page_list.union(article_list, post_list)

That will return the union of all the objects in those querysets.

If you want to get just the objects that are in the three querysets, you will love the built-in method of querysets, intersection.

# intersection method
result_list = page_list.intersection(article_list, post_list)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Daniel Diaz
  • 424
  • 6
  • 12
5

This will do the work without using any other libraries:

result_list = page_list | article_list | post_list
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
2

You can use "|"(bitwise or) to combine the querysets of the same model as shown below:

# "store/views.py"

from .models import Food
from django.http import HttpResponse
                                                
def test(request):
                                             # ↓ Bitwise or
    result = Food.objects.filter(name='Apple') | Food.objects.filter(name='Orange')
    print(result)
    return HttpResponse("Test")

Output on console:

<QuerySet [<Food: Apple>, <Food: Orange>]>
[22/Jan/2023 12:51:44] "GET /store/test/ HTTP/1.1" 200 9

And, you can use |= to add the queryset of the same model as shown below:

# "store/views.py"

from .models import Food
from django.http import HttpResponse
                                                
def test(request):
    result = Food.objects.filter(name='Apple')
         # ↓↓ Here
    result |= Food.objects.filter(name='Orange')
    print(result)
    return HttpResponse("Test")

Output on console:

<QuerySet [<Food: Apple>, <Food: Orange>]>
[22/Jan/2023 12:51:44] "GET /store/test/ HTTP/1.1" 200 9

Be careful, if adding the queryset of a different model as shown below:

# "store/views.py"

from .models import Food, Drink
from django.http import HttpResponse
                                                
def test(request):
          # "Food" model                      # "Drink" model
    result = Food.objects.filter(name='Apple') | Drink.objects.filter(name='Milk')
    print(result)
    return HttpResponse("Test")

There is an error below:

AssertionError: Cannot combine queries on two different base models.
[22/Jan/2023 13:40:54] "GET /store/test/ HTTP/1.1" 500 96025

But, if adding the empty queryset of a different model as shown below:

# "store/views.py"

from .models import Food, Drink
from django.http import HttpResponse
                                                
def test(request):
          # "Food" model                       # Empty queryset of "Drink" model 
    result = Food.objects.filter(name='Apple') | Drink.objects.none()
    print(result)
    return HttpResponse("Test")

There is no error below:

<QuerySet [<Food: Apple>]>
[22/Jan/2023 13:51:09] "GET /store/test/ HTTP/1.1" 200 9

Again be careful, if adding the object by get() as shown below:

# "store/views.py"

from .models import Food
from django.http import HttpResponse
                                                
def test(request):
    result = Food.objects.filter(name='Apple')
                         # ↓↓ Object
    result |= Food.objects.get(name='Orange')
    print(result)
    return HttpResponse("Test")

There is an error below:

AttributeError: 'Food' object has no attribute '_known_related_objects'
[22/Jan/2023 13:55:57] "GET /store/test/ HTTP/1.1" 500 95748
Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129
0

To get the intersection of both querysets:

result = first_queryset.intersection(second_queryset)
Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
-3

This recursive function concatenates array of querysets into one queryset.

def merge_query(ar):
    if len(ar) ==0:
        return [ar]
    while len(ar)>1:
        tmp=ar[0] | ar[1]
        ar[0]=tmp
        ar.pop(1)
        return ar