12

I have a requirement to display spend estimation for last 30 days. SpendEstimation is calculated multiple times a day. This can be achieved using simple SQL query:

SELECT DISTINCT ON (date) date(time) AS date, resource_id , time
FROM spend_estimation
WHERE
  resource_id = '<id>'
  and time > now() - interval '30 days'
ORDER BY date DESC, time DESC;

Unfortunately I can't seem to be able to do the same using SQLAlchemy. It always creates select distinct on all columns. Generated query does not contain distinct on.

query = session.query(
    func.date(SpendEstimation.time).label('date'),
    SpendEstimation.resource_id,
    SpendEstimation.time
).distinct(
    'date'
).order_by(
    'date',
    SpendEstimation.time
)
SELECT DISTINCT
    date(time) AS date,
    resource_id,
    time 
FROM spend
ORDER BY date, time

It is missing ON (date) bit. If I user query.group_by - then SQLAlchemy adds distinct on. Though I can't think of solution for given problem using group by.


Tried using function in distinct part and order by part as well.

query = session.query(
    func.date(SpendEstimation.time).label('date'),
    SpendEstimation.resource_id,
    SpendEstimation.time
).distinct(
    func.date(SpendEstimation.time).label('date')
).order_by(
    func.date(SpendEstimation.time).label('date'),
    SpendEstimation.time
)

Which resulted in this SQL:

SELECT DISTINCT
       date(time) AS date,
       resource_id,
       time,
       date(time) AS date # only difference
FROM spend
ORDER BY date, time

Which is still missing DISTINCT ON.

aisbaa
  • 9,867
  • 6
  • 33
  • 48

2 Answers2

1

Your SqlAlchemy version might be the culprit.

Sqlalchemy with postgres. Try to get 'DISTINCT ON' instead of 'DISTINCT'

Links to this bug report: https://bitbucket.org/zzzeek/sqlalchemy/issues/2142

A fix wasn't backported to 0.6, looks like it was fixed in 0.7.

Community
  • 1
  • 1
Robert Rodkey
  • 423
  • 3
  • 9
0

Stupid question: have you tried distinct on SpendEstimation.date instead of 'date'?

EDIT: It just struck me that you're trying to use the named column from the SELECT. SQLAlchemy is not that smart. Try passing in the func expression into the distinct() call.

jennykwan
  • 2,631
  • 1
  • 22
  • 33
  • Hi, thank you for suggestion, `SpendEstimation.date` column does not exist, it is calculated from spend_time. Maybe it is possible to add "virtual_column" to the model, not sure. I've tried passing `func` to distinct and it didn't change query in expected way. – aisbaa Nov 11 '15 at 11:40
  • OK. Another stupid question: what version of SQLAlchemy are you using? – jennykwan Nov 11 '15 at 16:38
  • Also, what is the updated Python code using the `func` approach? Have you tried wrapping `'date'` in a `text()`? – jennykwan Nov 11 '15 at 16:40