0

I need to calculate the daily price difference in percentage. The query I have works but is getting slower every day. The main idea is to calculate the delta with the previous row. The previous row is normally the previous day, but there might sometimes be a day missing. When that happens it needs to take the last day available.

I'm looking for a way to limit the set that I retrieve in the inner query. There are about 20.000 records added per day.

update
        price_watches pw
      set
        min_percent_changed = calc.delta
      from
        (select
          id,
          product_id,
          calculation_date,
          (1 - (price_min / lag(price_min) over (order by product_id, calculation_date))) * 100 as delta
        from
          price_watches
        where
          price_min > 0) calc
      where
        calc.id = pw.id;
Rudi
  • 1,577
  • 3
  • 16
  • 42
  • do you consider the last price on any given day if there are multiple records? – Vamsi Prabhala Jul 09 '15 at 20:33
  • The usual suspects are missing: table definition, Postgres version, relevant indexes. Also, your explanation doesn't mention an `UPDATE`. Tell us what you are doing *exactly*. I can make an educated guess but it should be you telling us not us making guesses. – Erwin Brandstetter Jul 10 '15 at 03:16

1 Answers1

0

This is wrong on many levels.

1.) It looks like you are updating all rows, including old rows that already have their min_percent_changed set and probably shouldn't be updated again.

2.) You are updating even if the new min_percent_changed is the same as the old.

3.) You are updating rows to store a redundant value that could be calculated on the fly rather cheaply (if done right), thereby making the row bigger and more error prone and producing lots of dead row versions, which means a lot of work for vacuum and slowing down everything else.

You shouldn't be doing any of this.

If you need to materialize the daily delta for read performance optimization, I suggest a small additional 1:1 table that can be updated cheaply without messing with the main table. Especially, if you recalc the value for every row every time. But better calculate new data.

If you really want to recalculate for every row (like your current UPDATE seems to do), make that a MATERIALIZED VIEW to automate the process.

If the new query I am going to demonstrate is fast enough, don't store any redundant data and calculate deltas on the fly.

For your current setup, this query should be much faster, when combined with this matching index:

CREATE INDEX price_watches_product_id_calculation_date_idx
  ON price_watches(product_id, calculation_date DESC NULLS LAST);

Query:

UPDATE price_watches pw
SET    min_percent_changed = calc.delta
FROM   price_watches p1  
     , LATERAL (
   SELECT (1 - p1.price_min / p2.price_min) * 100 AS delta
   FROM   price_watches p2
   WHERE  p2.product_id = p1.product_id
   AND    p2.calculation_date < p1.calculation_date
   ORDER  BY p2.calculation_date DESC NULLS LAST
   LIMIT  1
   ) calc
WHERE  p1.price_min > 0
AND    p1.calculation_date = current_date - 1  -- only update new rows!
AND    pw.id = p1.id
AND    pw.min_percent_changed IS DISTINCT FROM calc.delta; 

I am restricting the update to rows from "yesterday": current_date - 1. This is a wild guess at what you actually need.

Explanation for the added last line of the query:

Similar to this answer on dba.SE from just a few hours ago:

Proper information in the question would allow me to adapt the query and give more explanation.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228