109

What is the difference between filter with multiple arguments and chain filter in django?

testmobile
  • 1,091
  • 2
  • 8
  • 3

9 Answers9

66

As you can see in the generated SQL statements the difference is not the "OR" as some may suspect. It is how the WHERE and JOIN is placed.

Example1 (same joined table): from https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships

Blog.objects.filter(
       entry__headline__contains='Lennon', 
       entry__pub_date__year=2008)

This will give you all the Blogs that have one entry with both (entry__headline__contains='Lennon') AND (entry__pub_date__year=2008), which is what you would expect from this query.

Result:

Blog with {entry.headline: 'Life of Lennon', entry.pub_date: '2008'}

Example 2 (chained)

Blog.objects.filter(
       entry__headline__contains='Lennon'
           ).filter(
       entry__pub_date__year=2008)

This will cover all the results from Example 1, but it will generate slightly more result. Because it first filters all the blogs with (entry__headline__contains='Lennon') and then from the result filters (entry__pub_date__year=2008).

The difference is that it will also give you results like:

A single Blog with multiple entries

{entry.headline: '**Lennon**', entry.pub_date: 2000}, 
{entry.headline: 'Bill', entry.pub_date: **2008**}

When the first filter was evaluated the book is included because of the first entry (even though it has other entries that don't match). When the second filter is evaluated the book is included because of the second entry.

One table: But if the query doesn't involve joined tables like the example from Yuji and DTing. The result is same.

rbp
  • 43,594
  • 3
  • 38
  • 31
Johnny Tsang
  • 1,039
  • 10
  • 8
  • 33
    I assume I'm just dense this morning, but this sentence confuses me: "Because it first filters all the blogs with `(entry__headline__contains='Lennon')` and then from the result filters `(entry__pub_date__year=2008)`" If "then from the result" is accurate, why will it include something with `entry.headline == 'Bill'`...wouldn't `entry__headline__contains='Lennon'` filter out the `Bill` instance? – Dustin Wyatt Aug 09 '16 at 15:37
  • 19
    I'm also confused. It seems like this answer is just wrong, but it has 37 upvotes... – Personman Apr 12 '18 at 17:14
  • 2
    This answer is misleading and confusing, note the above is only correct when filtering using M2M relationships as noted in Yuji's answer. The key point is the example is filtering the Blog items with each filter statement, not the Entry items. – theannouncer May 03 '18 at 17:05
  • 1
    Because there are possibly multiple entries per blog. The language is correct. The concept can be confusing if you don't keep all the moving pieces in mind. – DylanYoung May 14 '18 at 13:30
  • @DustinWyatt I too had the same questions as you but I have finally got it! Please see the Employee and Dependent example written by Grijesh Chauhan below on this page and you will get it too. – theQuestionMan Oct 14 '20 at 02:46
45

The case in which results of "multiple arguments filter-query" is different than "chained-filter-query", following:

Selecting referenced objects on the basis of referencing objects and relationship is one-to-many (or many-to-many).

Multiple filters:

    Referenced.filter(referencing1_a=x, referencing1_b=y)
    #  same referencing model   ^^                ^^

Chained filters:

    Referenced.filter(referencing1_a=x).filter(referencing1_b=y)

Both queries can output different result:
If more then one rows in referencing-modelReferencing1can refer to same row in referenced-modelReferenced. This can be the case in Referenced: Referencing1 have either 1:N (one to many) or N:M (many to many) relation-ship.

Example:

Consider my application my_company has two models Employee and Dependent. An employee in my_company can have more than dependents(in other-words a dependent can be son/daughter of a single employee, while a employee can have more than one son/daughter).
Ehh, assuming like husband-wife both can't work in a my_company. I took 1:m example

So, Employee is referenced-model that can be referenced by more then Dependent that is referencing-model. Now consider relation-state as follows:

Employee:        Dependent:
+------+        +------+--------+-------------+--------------+
| name |        | name | E-name | school_mark | college_mark |
+------+        +------+--------+-------------+--------------+
| A    |        | a1   |   A    |          79 |           81 |
| B    |        | b1   |   B    |          80 |           60 |
+------+        | b2   |   B    |          68 |           86 |
                +------+--------+-------------+--------------+  

Dependenta1refers to employeeA, and dependentb1, b2references to employeeB.

Now my query is:

Find all employees those having son/daughter has distinction marks (say >= 75%) in both college and school?

>>> Employee.objects.filter(dependent__school_mark__gte=75,
...                         dependent__college_mark__gte=75)

[<Employee: A>]

Output is 'A' dependent 'a1' has distinction marks in both college and school is dependent on employee 'A'. Note 'B' is not selected because nether of 'B''s child has distinction marks in both college and school. Relational algebra:

Employee (school_mark >=75 AND college_mark>=75)Dependent

In Second, case I need a query:

Find all employees whose some of dependents has distinction marks in college and school?

>>> Employee.objects.filter(
...             dependent__school_mark__gte=75
...                ).filter(
...             dependent__college_mark__gte=75)

[<Employee: A>, <Employee: B>]

This time 'B' also selected because 'B' has two children (more than one!), one has distinction mark in school 'b1' and other is has distinction mark in college 'b2'.
Order of filter doesn't matter we can also write above query as:

>>> Employee.objects.filter(
...             dependent__college_mark__gte=75
...                ).filter(
...             dependent__school_mark__gte=75)

[<Employee: A>, <Employee: B>]

result is same! Relational algebra can be:

(Employee (school_mark >=75)Dependent) (college_mark>=75)Dependent

Note following:

dq1 = Dependent.objects.filter(college_mark__gte=75, school_mark__gte=75)
dq2 = Dependent.objects.filter(college_mark__gte=75).filter(school_mark__gte=75)

Outputs same result: [<Dependent: a1>]

I check target SQL query generated by Django using print qd1.query and print qd2.query both are same(Django 1.6).

But semantically both are different to me. first looks like simple section σ[school_mark >= 75 AND college_mark >= 75](Dependent) and second like slow nested query: σ[school_mark >= 75][college_mark >= 75](Dependent)).

If one need Code @codepad

btw, it is given in documentation @Spanning multi-valued relationships I have just added an example, I think it will be helpful for someone new.

mgalgs
  • 15,671
  • 11
  • 61
  • 74
Grijesh Chauhan
  • 57,103
  • 20
  • 141
  • 208
  • 7
    Thank you for this helpful explanation, it is better than the one in documentation which isn't clear at all. – wim Mar 12 '14 at 20:52
  • 3
    The last mark about filtering the Dependents directly is super helpful. It shows that the change in results definitively only happens when you go through a many-to-many relationship. If you query a table directly, chaining filters is just like combing twice. – Chris Nov 22 '17 at 14:47
  • What if `b2` record was Not there?. Would the chain filter print **B** as output in this case too? – ajinzrathod Jul 07 '21 at 07:53
  • @ajinzrathod No it wouldn't. An easy way to think of this is this: the first filter gives you 2 results. Which contains A, and B (The Employees). Now the second filter asks, 'Among A and B, give me Employees whose dependents have college_mars > 75. Since both have dependents (atleast 1) whose college marks > 75. It returns both again. – Mazhar Ali Dec 01 '21 at 06:58
  • Also, if someone can aswer me. Does this mean, in a reverse look up, the chaining is just OR condition? – Mazhar Ali Dec 01 '21 at 06:59
  • @MazharAli not OR but AND – Grijesh Chauhan Dec 01 '21 at 07:50
  • 1
    @GrijeshChauhan So a ',' inside .filter would mean "Give me all the rows where a single dependent completes BOTH the conditions. While chaining would mean. "Give me all the rows where (there exists) a dependent that completes the first condition AND (there exists) a dependent that completes second condition" – Mazhar Ali Dec 01 '21 at 08:23
23

Most of the time, there is only one possible set of results for a query.

The use for chaining filters comes when you are dealing with m2m:

Consider this:

# will return all Model with m2m field 1
Model.objects.filter(m2m_field=1) 

# will return Model with both 1 AND 2    
Model.objects.filter(m2m_field=1).filter(m2m_field=2) 

# this will NOT work
Model.objects.filter(Q(m2m_field=1) & Q(m2m_field=2))

Other examples are welcome.

Yuji 'Tomita' Tomita
  • 115,817
  • 29
  • 282
  • 245
  • 6
    Another example: It's not just limited to m2m, this can also happen with one-to-many - with the reverse lookup e.g. using the related_name on a ForeignKey – wim Mar 13 '14 at 11:14
  • Thanks for your explanation! Before that, I thought that last and 2nd examples are equal, so last example wasn't work for me (wrong query results), and I spent a lot of time in searches. 2nd example very helpful for me. Also as Wim said, this is usable with reverse one-to-many relations as in my case. – Vladimir Chub Aug 19 '16 at 06:21
19

This answer is based on Django 3.1.

Environment

Models

class Blog(models.Model):
    blog_id = models.CharField()

class Post(models.Model):
    blog_id  = models.ForeignKeyField(Blog)
    title    = models.CharField()
    pub_year = models.CharField() # Don't use CharField for date in production =]

Database tables

enter image description here

Filters call

Blog.objects.filter(post__title="Title A", post__pub_year="2020")
# Result: <QuerySet [<Blog: 1>]>

Blog.objects.filter(post__title="Title A").filter(post_pub_date="2020")
# Result: <QuerySet [<Blog: 1>, [<Blog: 2>]>

Explanation

Before I start anything further, I have to notice that this answer is based on the situation that uses "ManyToManyField" or a reverse "ForeignKey" to filter objects.

If you are using the same table or an "OneToOneField" to filter objects, then there will be no difference between using a "Multiple Arguments Filter" or "Filter-chain". They both will work like a "AND" condition filter.

The straightforward way to understand how to use "Multiple Arguments Filter" and "Filter-chain" is to remember in a "ManyToManyField" or a reverse "ForeignKey" filter, "Multiple Arguments Filter" is an "AND" condition and "Filter-chain" is an "OR" condition.

The reason that makes "Multiple Arguments Filter" and "Filter-chain" so different is that they fetch results from different join tables and use different conditions in the query statement.

"Multiple Arguments Filter" use "Post"."Public_Year" = '2020' to identify the public year

SELECT *
FROM "Book" 
INNER JOIN ("Post" ON "Book"."id" = "Post"."book_id")
WHERE "Post"."Title" = 'Title A'
AND "Post"."Public_Year" = '2020'

"Filter-chain" database query use "T1"."Public_Year" = '2020' to identify the public year

SELECT *
FROM "Book" 
INNER JOIN "Post" ON ("Book"."id" = "Post"."book_id")
INNER JOIN "Post" T1 ON ("Book"."id" = "T1"."book_id")
WHERE "Post"."Title" = 'Title A'  
AND "T1"."Public_Year" = '2020'

But why do different conditions impact the result?

I believe most of us who come to this page, including me =], have the same assumption while using "Multiple Arguments Filter" and "Filter-chain" at first.

Which we believe the result should be fetched from a table like following one which is correct for "Multiple Arguments Filter". So if you are using the "Multiple Arguments Filter", you will get a result as your expectation.

enter image description here

But while dealing with the "Filter-chain", Django creates a different query statement which changes the above table to the following one. Also, the "Public Year" is identified under the "T1" section instead of the "Post" section because of the query statement change.

enter image description here

But where does this weird "Filter-chain" join table diagram come from?

I'm not a database expert. The explanation below is what I understand so far after I created the exact structure of the database and made a test with the same query statement.

The following diagram will show where this weird "Filter-chain" join table diagram comes from.

enter image description here

enter image description here

The database will first create a join table by matching the row of the "Blog" and "Post" tables one by one.

After that, the database now does the same matching process again but uses the step 1 result table to match the "T1" table which is just the same "Post" table.

And this is where this weird "Filter-chain" join table diagram comes from.

Conclusion

So two things make "Multiple Arguments Filter" and "Filter-chain" different.

  1. Django create different query statements for "Multiple Arguments Filter" and "Filter-chain" which make "Multiple Arguments Filter" and "Filter-chain" result come from other tables.
  2. "Filter-chain" query statement identifies a condition from a different place than "Multiple Arguments Filter".

The dirty way to remember how to use it is "Multiple Arguments Filter" is an "AND" condition and "Filter-chain" is an "OR" condition while in a "ManyToManyField" or a reverse "ForeignKey" filter.

GooDeeJAY
  • 1,681
  • 2
  • 20
  • 27
LearnerAndLearn
  • 383
  • 4
  • 9
  • 1
    Great explanation, especially with "table join matching" visuals. This should be the accepted answer! – GooDeeJAY Nov 21 '22 at 11:08
14

The performance difference is huge. Try it and see.

Model.objects.filter(condition_a).filter(condition_b).filter(condition_c)

is surprisingly slow compared to

Model.objects.filter(condition_a, condition_b, condition_c)

As mentioned in Effective Django ORM,

  • QuerySets maintain state in memory
  • Chaining triggers cloning, duplicating that state
  • Unfortunately, QuerySets maintain a lot of state
  • If possible, don’t chain more than one filter
tar
  • 1,538
  • 1
  • 20
  • 33
8

You can use the connection module to see the raw sql queries to compare. As explained by Yuji's, for the most part they are equivalent as shown here:

>>> from django.db import connection
>>> samples1 = Unit.objects.filter(color="orange", volume=None)
>>> samples2 = Unit.objects.filter(color="orange").filter(volume=None)
>>> list(samples1)
[]
>>> list(samples2)
[]
>>> for q in connection.queries:
...     print q['sql']
... 
SELECT `samples_unit`.`id`, `samples_unit`.`color`, `samples_unit`.`volume` FROM `samples_unit` WHERE (`samples_unit`.`color` = orange  AND `samples_unit`.`volume` IS NULL)
SELECT `samples_unit`.`id`, `samples_unit`.`color`, `samples_unit`.`volume` FROM `samples_unit` WHERE (`samples_unit`.`color` = orange  AND `samples_unit`.`volume` IS NULL)
>>> 
dting
  • 38,604
  • 10
  • 95
  • 114
3

If you end up on this page looking for how to dynamically build up a django queryset with multiple chaining filters, but you need the filters to be of the AND type instead of OR, consider using Q objects.

An example:

# First filter by type.
filters = None
if param in CARS:
  objects = app.models.Car.objects
  filters = Q(tire=param)
elif param in PLANES:
  objects = app.models.Plane.objects
  filters = Q(wing=param)

# Now filter by location.
if location == 'France':
  filters = filters & Q(quay=location)
elif location == 'England':
  filters = filters & Q(harbor=location)

# Finally, generate the actual queryset
queryset = objects.filter(filters)
Matt
  • 4,815
  • 5
  • 39
  • 40
  • In case the if or elif is not passed, the filters variable will be None and then you receive a TypeError: unsupported operand type(s) for &: 'NoneType' and 'Q'. I initiated the filters with filters = Q() – cwhisperer Apr 17 '19 at 07:02
0

If requires a and b then

and_query_set = Model.objects.filter(a=a, b=b)

if requires a as well as b then

chaied_query_set = Model.objects.filter(a=a).filter(b=b)

Official Documents: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships

Related Post: Chaining multiple filter() in Django, is this a bug?

Tony Aziz
  • 899
  • 6
  • 4
-4

There is a difference when you have request to your related object, for example

class Book(models.Model):
    author = models.ForeignKey(Author)
    name = models.ForeignKey(Region)

class Author(models.Model):
    name = models.ForeignKey(Region)

request

Author.objects.filter(book_name='name1',book_name='name2')

returns empty set

and request

Author.objects.filter(book_name='name1').filter(book_name='name2')

returns authors that have books with both 'name1' and 'name2'

for details look at https://docs.djangoproject.com/en/dev/topics/db/queries/#s-spanning-multi-valued-relationships

Maiiku
  • 305
  • 1
  • 9
Marten
  • 17
  • 1
  • 5
    `Author.objects.filter(book_name='name1',book_name='name2')` is not even valid python, it would be `SyntaxError: keyword argument repeated` – wim Mar 12 '14 at 17:03
  • 1
    Where is book_name defined exactly? Do you mean book_set__name? – DylanYoung Mar 10 '17 at 13:34