0

I have the following SQL statement:

 SELECT DISTINCT ON (watch.url)
        watch.url, watch.name, user_url.frequency, watch.selector, price.date 
   FROM watch 
        JOIN user_url on user_url.url = watch.url 
        LEFT JOIN price on price.url = watch.url 
  WHERE (1400575234 > price.date + user_url.frequency)
     OR price.date is null
  ORDER BY watch.url, price.date DESC

Where 1400575234 is the current unix timestamp. The table price holds the price of an item at different time intervals, one per row.

Currently this query will return me all the items because at some point they have had an entry which is less than the current date. Whereas I am only looking at grabbing the URL's whose price.date is less than the current time but I only want to match against the highest date for the url in the price table.

I think I need to run the order by before the select distinct works. Does anyone have any ideas?

(Hope this makes sense)

Kevin Hogg
  • 1,771
  • 25
  • 34
dtsn
  • 1,077
  • 1
  • 10
  • 17
  • You need to include a GROUP BY command. See more [here]( http://stackoverflow.com/questions/5391564/how-to-use-distinct-and-order-by-in-same-select-statement) – Jonathan Wheeler May 20 '14 at 09:07
  • `ORDER BY` always *executes* before `DISTINCT` http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT , what you need to do is to filter with `WHERE` after `DISTINCT` (possibly with a sub-query) or use `GROUP BY` & `HAVING` – pozs May 20 '14 at 09:12
  • @pozs @jonathan-wheeler unfortunately a GROUP BY doesn't work adding `GROUP BY watch.url, price.date, user_url.frequency` doesn't make any difference to the outcome – dtsn May 20 '14 at 09:18
  • Assuming that your columns other than date have the same values in the query, you need to also add a `MAX(...)` around `price.date` in your `SELECT` when using `GROUP BY`. – MicSim May 20 '14 at 09:28
  • @MicSim If i exclude the where statement putting a `MAX(...)` around price.date the query does return the latest dates. However with the where it does't also tried with using the `GROUP BY` and `HAVING` – dtsn May 20 '14 at 09:41

2 Answers2

0

You could try using a window function instead:

SELECT url, name, frequency, selector, date
FROM (
 SELECT watch.url, watch.name, user_url.frequency, watch.selector, price.date,
        row_number() over (partition by watch.url order by price.date desc) as rn
   FROM watch 
        JOIN user_url on user_url.url = watch.url 
        LEFT JOIN price on price.url = watch.url 
  WHERE (1400575234 > price.date + user_url.frequency)
     OR price.date is null
) t
WHERE rn = 1
ORDER BY url, date;

Btw: date is a horrible name for a column

0

I've seemed to of solved it using the following

SELECT DISTINCT ON (watch.url)
watch.url, watch.name, user_url.frequency, watch.selector, p.date 
FROM watch 
JOIN user_url on user_url.url = watch.url
LEFT JOIN (
    SELECT max(price.date) as date, url FROM price GROUP BY url
) as p ON p.url = watch.url
WHERE 1400575234 > p.date + user_url.frequency
ORDER BY watch.url

Thanks for everyones help. If you can see anyway to make the query more efficient please let me know.

dtsn
  • 1,077
  • 1
  • 10
  • 17