0

I have a table called UnitOfWork which has 3 columns cases_identified, cases_completed and cases_double_check, all of which are Postgresql arrays of integers.

Is it possible to write a query (or a hybrid property) that finds the cases that have been identified, but are not in the completed or double-check columns?

This is what I came up with, but the SQL expression doesn't work:

@hybrid_property
def todo(self):
    return [x for x in self.cases_identified
            if x not in self.cases_completed and
            x not in self.cases_double_check]

@todo.expression
def todo(cls):
    return [x for x in cls.cases_identified
            if x not in  cls.cases_completed and
            x not in cls.cases_double_check]

The error I get on a test query is:

test = Session.query(UnitOfWork.todo).first()
NotImplementedError: Operator 'contains' is not supported on this expression
bard
  • 2,762
  • 9
  • 35
  • 49

1 Answers1

2

For this answer I'll assume that cls.cases_identified, cls.cases_completed, and cls.cases_double_check are of postgresql.ARRAY(Integer) type in Python side.

Your @todo.expression should return just that: an SQL expression. Currently it is trying to return a python list. The exception is raised since a postgresql.ARRAY does not support the in operator, though it does have a method contains, which maps to the @> operator in Postgresql and tests "if elements are a superset of the elements of the argument array expression". This on the other hand is not what you want. You're lucky that you had the if x not in ... in there, as plain

[x for x in cls.cases_identified]

seems to result in an infinite loop instead of an exception.

Getting the difference between arrays in Postgresql has been covered here extensively, but here's how you would apply that using SQLAlchemy, first using an array constructor:

from sqlalchemy import func

...

class UnitOfWork(...):

    @todo.expression
    def todo(cls):
        # Force correlation of `cls` from outer FROM objects. Use
        # the `select()` shorthand method of `FunctionElement`.
        identified = func.unnest(cls.cases_identified).select().correlate(cls)
        completed = func.unnest(cls.cases_completed).select().correlate(cls)
        double_check = func.unnest(cls.cases_double_check).select().correlate(cls)
        # Create the subquery statement
        stmt = identified.except_(completed.union(double_check))
        # Uses the `func` generic for ARRAY constructor from
        # subquery syntax. The subquery statement has to be
        # converted to a "scalar subquery" in the eyes of SQLA
        # (really just a subquery returning 1 col) in order
        # to remove it from the FROM clause hierarchy.
        return func.array(stmt.as_scalar())

This has a downside of not providing any FROM objects (as it correlates everything from the enclosing query), so you'd have to issue the original query like this:

test = Session.query(UnitOfWork.todo).select_from(UnitOfWork).first()

You could also use the Postgresql intarray module that provides special functions and operators for null-free arrays of integers:

class UnitOfWork(...):

    @todo.expression
    def todo(cls):
        return (cls.cases_identified - cls.cases_completed - 
                cls.cases_double_check)

Note that you have to install the extension first in Postgresql:

CREATE EXTENSION intarray;
Community
  • 1
  • 1
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127