2

I am querying (via sqlalchemy) my_table with a conditional on a column and then retrieve distinct values in another column. Quite simply

selection_1 = session.query(func.distinct(my_table.col2)).\
    filter(my_table.col1 == value1)

I need to do this repeatedly to get distinct values from different columns from my_table.

selection_2 = session.query(func.distinct(my_table.col3)).\
    filter(my_table.col1 == value1).\
    filter(my_table.col2 == value2)

selection_3 = session.query(func.distinct(my_table.col4)).\
    filter(my_table.col1 == value1).\
    filter(my_table.col2 == value2).\
    filter(my_table.col3 == value3)

The above code works, but as I need to have 6 successive calls it's getting a bit out of hand. I have created a class to handle the method chaining:

class QueryProcessor:
    def add_filter(self, my_table_col, value):
        filter(my_table_col == value)
        return self

    def set_distinct_col(self, my_other_table_col):
        self.my_other_table_col = my_other_table_col
        return session.query(func.distinct(self.my_other_table_col))

Ideally I'd be able to use the class like

selection_1 = QueryProcessor().set_distinct_col(my_table.col2).add_filter(my_table.col1, value1)
selection_2 = selection_1.set_distinct_col(my_table.col3).add_filter(my_table.col2, value2)
selection_3 = selection_2.set_distinct_col(my_table.col4).add_filter(my_table.col3, value3)

but when I run

selection_1 = QueryProcessor().set_distinct_col(my_table.col2).add_filter(my_table.col1, value1)

I get the following error:

Traceback (most recent call last):
File " ... "
exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-20-789b26eccbc5>", line 10, in <module>
    selection_1 = QueryProcessor().set_distinct_col(my_table.col2).add_filter(my_table.col1, value1)
AttributeError: 'Query' object has no attribute 'add_filter'

Any help will be much welcomed.

montoisky
  • 133
  • 1
  • 6

2 Answers2

1

You don't really need a special class for this. Your existing code

selection_2 = session.query(func.distinct(my_table.col3)).\
    filter(my_table.col1 == value1).\
    filter(my_table.col2 == value2)

works because filter is returning a new query based on the original query, but with an additional filter added to it. You can just iterate over the columns and their corresponding values, replacing each old query with its successor.

selection2 = session.query(func.distinct(my_table.col3))
for col, val in zip([my_table.col1, my_table.col2], [value1, value2]):
    selection2 = selection2.filter(col == val)

selection_3 = session.query(func.distinct(my_table.col4))
for col, val in zip([mytable.col1, mytable.col2, mytable.col3],
                    [value1, value2, value3]):
    selection_3 = selection_3.filter(col == val)

That said, the problem with your code is that add_filter doesn't actually call the query's filter method, or update the wrapped query.

class QueryProcessor:
    def set_distinct_col(self, my_other_table_col):
        self.query = session.query(func.distinct(self.my_other_table_col))
        return self

    def add_filter(self, my_table_col, value):
        self.query = self.query.filter(my_table_col == value)
        return self

This poses a problem, though: set_distinct_col creates a new query, so it doesn't really make sense in the following

selection_1 = QueryProcessor().set_distinct_col(my_table.col2).add_filter(my_table.col1, value1)
selection_2 = selection_1.set_distinct_col(my_table.col3).add_filter(my_table.col2, value2)
selection_3 = selection_2.set_distinct_col(my_table.col4).add_filter(my_table.col3, value3)

to call set_distinct_col on an existing instance. It can return either a new query or the existing one, but not both (at least, not if you want to do chaining).

Also, note that selection_1 itself is not the query, but selection_1.query.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • Thanks. I’ve abandoned my class based approach and gone for your list based approach. Tested and working. “ … it can return either a new query or the existing one, but not both …” was the crux of it. Your suggestion has given me an idea to append to the lists (which would in the end work like a chaining in the iteration) in subsequent queries. Many thanks once again. – montoisky May 21 '20 at 20:34
0

For your add_filter() function to work as intended, you need your set_distinct_col() function to return a reference to itself (an instance of QueryProcessor).
session.query() returns a Query object which doesn't have an add_filter() method. Query could have an add_filter method if you did something like Query.add_filter = add_filter, but that's a bad practice because it modifies the Query class, so I don't recommend doing it.

What you're doing is a better option. In order to have access to the query you create with the set_distinct_col() method, you need to store it as an instance variable.
Below, I have done this by storing the query in the instance variable query with
self.query = session.query(func.distinct(self.my_other_table_col))

Then, I changed the add_filter() method to return itself to allow for chaining more add_filter() methods.

class QueryProcessor:
    def add_filter(self, my_table_col, value):
        self.query = self.query.filter(my_table_col == value)
        return self

    def set_distinct_col(self, my_other_table_col):
        self.my_other_table_col = my_other_table_col
        self.query = session.query(func.distinct(self.my_other_table_col))
        return self

You should also know that you can use multiple filter conditions at a time, so you don't actually need to chain multiple filters together.

session.query(db.users).filter(or_(db.users.name=='Ryan', db.users.country=='England'))

or

session.query(db.users).filter((db.users.name=='Ryan') | (db.users.country=='England'))

Difference between filter and filter_by in SQLAlchemy

P.S. This code has not been tested

Evan Wunder
  • 131
  • 3
  • Your multiple condition syntax works and results in: Qry_2 = session.query(db.users).filter((db.users.name=='Ty') | (db.users.country==’UK’)) Qry_3 = session.query(db.users).filter((db.users.name=='Ty') | (db.users.country==’UK’) | (db.users.age==3)) And the I wanted to avoid the code repetition between subsequent queries. I used your code and it worked on the very first call (selection_1 = QueryProcessor()…) but on the second call (selection_2 = selection_1… ) it returned AttributeError: 'Query' object has no attribute 'set_distinct_col'. Tx 4 help. Wont spend more time on elegance :) – montoisky May 21 '20 at 19:12
  • Hey, glad I could help, @montoisky! I changed the `add_filter()` method to return an instance of itself (`QueryProcessor`). This is needed to chain methods from your `QueryProcessor` class because `filter()` returns an instance of `Query` not `QueryProcessor` and `Query` doesn't have an `add_filter()`. If that helps, please accept my answer – Evan Wunder May 21 '20 at 19:42
  • Tx again. With your new class: `sel_1 = QueryProcessor().set_distinct_col(my_table.col2).add_filter(my_table.col1, value1)` results in `SELECT distinct(my_table.col2) AS distinct_1 FROM my_table WHERE my_table.col1 = %(col1_1)s`. Good so far. However the next call `sel_2 = sel_1.set_distinct_col(my_table.col3).add_filter(my_table.col2, value2)` results in `SELECT distinct(my_table.col3) AS distinct_1 FROM my_table WHERE my_table.col2 = %(col2_1)s`. The `set_distinct_col()` works, but not `add_filter()` as it did not add col2 as an `AND` (but rather just replaced col1). – montoisky May 22 '20 at 00:37