4

I want to build a complicated filter:

queryset.filter(
  (Q(k__contains=“1”) & Q(k__contains=“2”) & (~Q(k__contains=“3”))) | 
  (Q(k1__contains=“2”) & (~Q(k4__contains=“3”)))
)

The structure is fixed, but the query is dynamic and depends on a case specified by given input.

Tthe input could be for example :

(k=1&k=2&~k=3) | (k1=1&~k4=3)

or

(k=1&~k=3) | (k1=1&~k4=3) | (k=4&~k=3)

How to add parentheses to build this query to make it run as expected?

hynekcer
  • 14,942
  • 6
  • 61
  • 99
rongdong.bai
  • 471
  • 1
  • 6
  • 16
  • The question is about parentheses, but the examples are so simple that they would work also without parentheses considering only the priority of operators and you wrote about "fixed" structure and "complicated" filter. Can you please specify more also by examples, how complex or fixed this structure is? Nested parentheses? Constants on the right side of equal sign are numeric only? – hynekcer Jun 23 '20 at 05:55
  • Similar question here: https://stackoverflow.com/questions/70196305/how-to-create-dynamic-queries-with-django-q-objects-from-parenthesis-inside-a-st?noredirect=1&lq=1 – caram Apr 18 '22 at 08:30

3 Answers3

3

Two answers are presented: one simple and one perfect

1) A simple answer for similarly simple expressions:

It your expressions are simple and it seems better to read it by a short Python code. The most useful simplification is that parentheses are not nested. Less important simplification is that the operator OR ("|") is never in parentheses. The operator NOT is used only inside parentheses. The operator NOT is never double repeated ("~~").

Syntax: I express these simplifications in a form of EBNF syntax rules that could be useful later in a discussion about Python code.

expression = term, [ "|", term ];
term       = "(", factor, { "&", factor }, ")";
factor     = [ "~" ], variable, "=", constant;

variable   = "a..z_0..9";           # anything except "(", ")", "|", "&", "~", "="
constant   = "0-9_a-z... ,'\"";     # anything except "(", ")", "|", "&", "~", "="

Optional white spaces are handled easily by .strip() method to can freely accept expressions like in mathematics. White spaces inside constants are supported.

Solution:

def compile_q(input_expression):
    q_expression = ~Q()  # selected empty
    for term in input_expression.split('|'):
        q_term = Q()     # selected all
        for factor in term.strip().lstrip('(').rstrip(')').split('&'):
            left, right = factor.strip().split('=', 1)
            negated, left = left.startswith('~'), left.lstrip('~')
            q_factor = Q(**{left.strip() + '__contains': right.strip()})
            if negated:
                q_factor = ~q_factor
            q_term &= q_factor
        q_expression |= q_term
    return q_expression

The superfluous empty and full Q expressions ~Q() and Q() are finally optimized and eliminated by Django.

Example:

>>> expression = "(k=1&k=2&~k=3) | ( k1 = 1 & ~ k4 = 3 )"
>>> qs = queryset.filter(compile_q(expression))

Check:

>>> print(str(qs.values('pk').query))        # a simplified more readable sql print
SELECT id FROM ... WHERE
((k LIKE %1% AND k LIKE %2% AND NOT (k LIKE %3%)) OR (k1 LIKE %1% AND NOT (k4 LIKE %3%)))
>>> sql, params = qs.values('pk').query.get_compiler('default').as_sql()
>>> print(sql); print(params)                # a complete parametrized escaped print
SELECT... k LIKE %s ...
[... '%2%', ...]

The first "print" is a Django command for simplified more readable SQL without apostrophes and escaping, because it is actually delegated to the driver. The second print is a more complicated parametrized SQL command with all possible safe escaping.


2) Perfect, but longer solution

This answer can compile any combination of boolean operators "|", "&", "~", any level of nested parentheses and a comparision operator "=" to a Q() expression:

Solution: (not much more complicated)

import ast    # builtin Python parser
from django.contrib.auth.models import User
from django.db.models import Q


def q_combine(node: ast.AST) -> Q:
    if isinstance(node, ast.Module):
        assert len(node.body) == 1 and isinstance(node.body[0], ast.Expr)
        return q_combine(node.body[0].value)
    if isinstance(node, ast.BoolOp):
        if isinstance(node.op, ast.And):
            q = Q()
            for val in node.values:
                q &= q_combine(val)
            return q
        if isinstance(node.op, ast.Or):
            q = ~Q()
            for val in node.values:
                q |= q_combine(val)
            return q
    if isinstance(node, ast.UnaryOp):
        assert isinstance(node.op, ast.Not)
        return ~q_combine(node.operand)
    if isinstance(node, ast.Compare):
        assert isinstance(node.left, ast.Name)
        assert len(node.ops) == 1 and isinstance(node.ops[0], ast.Eq)
        assert len(node.comparators) == 1 and isinstance(node.comparators[0], ast.Constant)
        return Q(**{node.left.id + '__contains': str(node.comparators[0].value)})
    raise ValueError('unexpected node {}'.format(type(node).__name__))

def compile_q(expression: str) -> Q:
    std_expr = (expression.replace('=', '==').replace('~', ' not ')
                .replace('&', ' and ').replace('|', ' or '))
    return q_combine(ast.parse(std_expr))

Example: the same as in my previous answer a more complex following:

>>> expression = "~(~k=1&(k1=2|k1=3|(k=5 & k4=3))"
>>> qs = queryset.filter(compile_q(expression))

The same example gives the same result, a more nested example gives a correct more nested result.

EBNF syntax rules are not important in this case, because no parser is implemented in this solution and the standard Python parser AST is used. It is little different with recursion.

expression = term, [ "|", term ];
term       = factor, { "&", factor };
factor     = [ "~" ], variable, "=", constant | [ "~" ],  "(", expression, ")";

variable   = "a..z_0..9";  # any identifier acceptable by Python, e.g. not a Python keyword
constant   = "0-9_a-z... ,'\"";    # any numeric or string literal acceptable by Python
hynekcer
  • 14,942
  • 6
  • 61
  • 99
  • Not exact and maybe too simplified because it is not recursive. The variable and constant is only by example. I would prefer a completely different solution if a really complex expression should be parsed. – hynekcer Jun 23 '20 at 07:08
1

This answer can compile any combination of boolean operators "|", "&", "~", any level of nested parentheses and a comparision operator "=" to a Q() expression:

Solution: (not much more complicated)

import ast    # builtin Python parser
from django.contrib.auth.models import User
from django.db.models import Q


def q_combine(node: ast.AST) -> Q:
    if isinstance(node, ast.Module):
        assert len(node.body) == 1 and isinstance(node.body[0], ast.Expr)
        return q_combine(node.body[0].value)
    if isinstance(node, ast.BoolOp):
        if isinstance(node.op, ast.And):
            q = Q()
            for val in node.values:
                q &= q_combine(val)
            return q
        if isinstance(node.op, ast.Or):
            q = ~Q()
            for val in node.values:
                q |= q_combine(val)
            return q
    if isinstance(node, ast.UnaryOp):
        assert isinstance(node.op, ast.Not)
        return ~q_combine(node.operand)
    if isinstance(node, ast.Compare):
        assert isinstance(node.left, ast.Name)
        assert len(node.ops) == 1 and isinstance(node.ops[0], ast.Eq)
        assert len(node.comparators) == 1 and isinstance(node.comparators[0], ast.Constant)
        return Q(**{node.left.id + '__contains': str(node.comparators[0].value)})
    raise ValueError('unexpected node {}'.format(type(node).__name__))

def compile_q(expression: str) -> Q:
    std_expr = (expression.replace('=', '==').replace('~', ' not ')
                .replace('&', ' and ').replace('|', ' or '))
    return q_combine(ast.parse(std_expr))

Example: the same as in my previous answer a more complex following:

>>> expression = "~(~k=1&(k1=2|k1=3|(k=5 & k4=3))"
>>> qs = queryset.filter(compile_q(expression))

The same example gives the same result, a more nested example gives a correct more nested result.

EBNF syntax rules are not important in this case, because no parser is implemented in this solution and the standard Python parser AST is used. It is little different with recursion.

expression = term, [ "|", term ];
term       = factor, { "&", factor };
factor     = [ "~" ], variable, "=", constant | [ "~" ],  "(", expression, ")";

variable   = "a..z_0..9";  # any identifier acceptable by Python, e.g. not a Python keyword
constant   = "0-9_a-z... ,'\"";    # any numeric or string literal acceptable by Python
hynekcer
  • 14,942
  • 6
  • 61
  • 99
0

Finally I failed to make this work using django Q.

Bug use extra to build code like

queryset.extra(where = ["(k1 like '%1%' and k2 like '%2%' and (k3 not like '%3%')) or (k1 like '%4%' and (k3 not like '%3%'))"] ) and it works.

rongdong.bai
  • 471
  • 1
  • 6
  • 16
  • It is strange that it did not work with Q expressions because your solution is exactly how the original with Q expressions would be compiled. You can verify it by `print(str(queryset.query))` (It prints a simplified SQL without escaping and and without apostrophes.) The solution with `extra(where=...)` is unsafe e.g. if the right side of "like" could contain something with an apostrophe. It is fortune and an accident that one solution worked and an equally simple solutions did not. – hynekcer Jun 22 '20 at 08:17
  • @hynekcer I want to dynamicially build the Q query with parentheses from the changing input, not the fixed query I made by hand. – rongdong.bai Jun 22 '20 at 08:29
  • @hynekcer the input query can be simple like k=1&~k=3 or more complicated like (k=1&~k=3) | (k1=1&~k4=3) | (k=4&~k=3) – rongdong.bai Jun 22 '20 at 08:31
  • I understand. You didn't parsed the expression and that SQL is created by string replacement. – hynekcer Jun 22 '20 at 12:21