0

I have a Flask app which displays some simple statistics about the number of books a user entered and how many they've marked as read each month. When testing the query locally on a SQLite3 database everything works fine. However, when I deploy to Heroku, which uses a PostgreSQL database, the statistics come out wrong. The query looks like:

books_group = db.session.query(extract("month", Book.timestamp), func.count(Book.id), func.count(Book.read)).filter(Book.user_id == current_user.id).group_by(extract("month", Book.timestamp)).all()

When running locally the output looks like: [(4, 27, 21)] where the first number is the month, the second the amount of books entered and the third the amount of books read. When running on Heroku the output is: [(4.0, 10, 10)] No matter what I do that amount of books entered and the amount of books marked as read is always equal, eventhough the output should be [(4.0, 10, 5)]

My model looks like:

class Book(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    title = db.Column(db.String(140))
    timestamp = db.Column(db.DateTime, index=True, default=datetime.utcnow)
    read = db.Column(db.Boolean, default=False)
    user_id = db.Column(db.Integer, db.ForeignKey('user.id'))

And the data looks like:

+----+--------+-----------------------+------+---------+
| id | title  |       timestamp       | read | user_id |
+----+--------+-----------------------+------+---------+
|  1 | Foo    | 17 apr. 2020 13:14:47 |    1 |       1 |
|  2 | Bar    | 17 apr. 2020 13:14:47 |    1 |       1 |
|  3 | Baz    | 18 apr. 2020 13:14:47 |    0 |       1 |
|  4 | Foobar | 18 apr. 2020 13:14:47 |    1 |       2 |
|  5 | BarBaz | 18 apr. 2020 13:14:47 |    0 |       2 |
|    | FooBaz | 19 apr. 2020 13:14:47 |    0 |       1 |
+----+--------+-----------------------+------+---------+

What is going wrong here?

EDIT:

The solution that worked both for SQLite3 and PostgreSQL was:

books_group = db.session.query(extract("month", Book.timestamp), func.count(Book.id), func.sum(cast(Book.read, Integer))).filter(Book.user_id == current_user.id).group_by(extract("month", Book.timestamp)).all()
Stefan
  • 61
  • 1
  • 1
  • 9
  • @IljaEverilä I've edited my question! – Stefan Apr 19 '20 at 21:46
  • 1
    `func.count(Book.read)` counts how many times `Book.read` is not NULL. If your data contains no NULLs, every row in a group is counted. – Ilja Everilä Apr 20 '20 at 05:26
  • Changing the code to `func.count(Book.read == True)` produces the same inconsistent results. It works on SQLite3 but not PostgreSQL – Stefan Apr 20 '20 at 07:55
  • 1
    `func.count(Book.read == True)` makes no difference. If `Book.read` is a boolean column, then `Book.read` and `Book.read == True` are equivalent. And if `Book.read` is NULL, then the result of `Book.read == True` is also NULL. Since `COUNT()` counts how many times *expression* evaluates to not NULL, there is no inconsistency. Your data in PostgreSQL seems to have **no** NULLs, while the data in SQLite seems to have NULLs. – Ilja Everilä Apr 20 '20 at 08:04
  • I see. Yes the PostgreSQL indeed has no NULL values for the read column. So, how would I count the number of True's in a way that works for both SQLite3 and PostgreSQL? – Stefan Apr 20 '20 at 08:25
  • 1
    Here are both PostgreSQL specific and portable ways to count true: https://stackoverflow.com/questions/37328779/sqlalchemy-func-count-on-boolean-column – Ilja Everilä Apr 20 '20 at 13:06
  • Turns out that the sum and cast to integer solution worked for me! Thanks! – Stefan Apr 20 '20 at 13:54

0 Answers0