7

How can I count easily the number of rows where a particular column is true and the number where it is false ?

I can't (or can I ?) run the query with count() because I'm embedding this count in a having() clause, like :

.having(func.count(Question.accepted) >
        func.count(not_(Question.accepted)))

but with the above way, the function counts every line on both sides of the inequality.

I tried something like this

.having(func.count(func.if_(Question.accepted, 1, 0)) >
        func.count(func.if_(Question.accepted, 0, 1)))

But I get an error

function if(boolean, integer, integer) does not exist

(seems it doesn't exist in postgresql).

How can I count easily the number of rows where column is true and false ?

Eino Gourdin
  • 4,169
  • 3
  • 39
  • 67

3 Answers3

10

Using aggregate functions in a HAVING clause is very much legal, since HAVING eliminates group rows. Conditional counting can be achieved either by using the property that NULLs don't count:

count(expression) ... number of input rows for which the value of expression is not null

or if using PostgreSQL 9.4 or later, with the aggregate FILTER clause:

count(*) FILTER (WHERE something > 0)

You could also use a sum of ones (and zeros).

PostgreSQL >= 9.4 and SQLAlchemy >= 1.0.0

Using a filtered aggregate function:

.having(func.count(1).filter(Question.accepted) >
        func.count(1).filter(not_(Question.accepted)))

Older PostgreSQL and/or SQLAlchemy

The SQL analog for "if" is either CASE expression or in this case nullif() function. Both of them can be used together with the fact that NULLs don't count:

from sqlalchemy import case

...

.having(func.count(case([(Question.accepted, 1)])) >
        func.count(case([(not_(Question.accepted), 1)])))

or:

.having(func.count(func.nullif(Question.accepted, False)) >
        func.count(func.nullif(Question.accepted, True)))

Using nullif() can be a bit confusing as the "condition" is what you don't want to count. You could device an expression that would make the condition more natural, but that's left for the reader. These 2 are more portable solutions, but on the other hand the FILTER clause is standard, though not widely available.

Community
  • 1
  • 1
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • Good detailed information. SQlite supports filtered aggregates as well, so you could use the same approach. Note that SQLAlchemy 2.0 takes cases as positional arguments instead of a list. – Mattwmaster58 Jul 08 '23 at 18:11
4

You can use this query:

Session.query(func.sum(case([(Question.accepted == True, 1)], else_=0).label('accepted_number'))

And the same column will be for False value, but with False in condition

Or, you can use window function:

Session.query(func.count(Question.id).over(partition_by=Question.accepted), Question.accepted).all()

The result will contain two rows (if there are only two possible values in Question.accepted), where the first column is the number of values, and the second is the values of 'accepted' column.

antonio_antuan
  • 1,293
  • 9
  • 22
2

Most flavors of SQL store booleans as 1-bit integers, and calling SUM on them properly gives you the number of positive booleans/True rows. In your case, you can simply do:

func.sum(Question.accepted)

If your SQL engine returns a SUM of booleans as a boolean (not sure there is any), you can use expression function cast() to force it into an integer instead of using func.count(case([(Question.accepted, 1)])):

func.sum(sqlalchemy.cast(Question.accepted, sqlalchemy.Integer))

Finally, if you want the sum as an integer on the Python side the expression function type_coerce() does the trick:

sqlalchemy.type_coerce(func.sum(Question.accepted), sqlalchemy.Integer)
  • What SQL "overhead"? That seems like it should be explained a bit more. Note that the `SUM()` trick is mentioned and linked to in "You could also use a sum of ones (and zeros)." The `SUM()` trick has one edge though: it produces a different result for 0 rows. – Ilja Everilä Apr 20 '20 at 15:49
  • My bad, I would have thought the `case` statement would have added some processing time in SQL, but after a few tests in MySQL it seems like the impact is negligible. Despite that, I still think it’s cleaner to not clutter up the code with a useless `case` statement when sums on booleans work just well and can be gotten thanks to `type_coerce`. – Gary Mialaret Apr 21 '20 at 13:13
  • Agreed, the whole `COUNT(CASE ... )` thing is not easy on the eyes. Personally I do quite like the ` FILTER (WHERE )` though, it's rather flexible, though not broadly supported. – Ilja Everilä Apr 21 '20 at 15:47