1

In SQL table person_rate we have stored a rate float value which changes in time. Columns:

id (serial, PK)
person_id (int)
date_from (date)
rate (float)

(person_id, date_from) is unique, because at most one change per day is allowed (maybe it could be a PK, but it's not important)

Rate value for given person_id is valid at time domain from date_from date to next record with succesive date_from, or to the infty future if there is no such record. Example:

id   person_id  date_from  rate
101  1          2011-01-01 100.0
145  1          2012-01-01 180.0
193  1          2012-05-01 140.0

Now I need a SELECT query which for each person_id returns a rate valid for some given $date. Let say for 2012-03-01 it's 180; for 2012-05-02 it's 140 and so.

Solutions I have tested:

1) condition date_from <= $date + use window function rank() OVER (PARTITION BY person_id ORDER BY date_from DESC + in superselect WHERE rank = 1

2) similar to 1) but use SELECT DISTINCT ON (person_id) instead of limiting rank to 1

Both 1) and 2) does not perform well, EXPLAIN show that db needs to sort all records for each person_id and then limit to 1 first. Probably this type of query can't fully utilise index on date_from?

IDEA - add a date_to column, which will be a bit redundant, because the value will be "date_from of succesive record, minus 1 day" (or +infty if no succesive record). But the query then could be with date_from <= $date AND date_to >= $date - which probably would have a good performance with indexes on date_from and date_to.

But I'm a bit afraid how to manage the data integrity in this case - how to set up constraint that [ date_from .. date_to ] intervals for one person_id shoud not be overlapped?

What is the best solution for postgresql for this type of query? Load is most read, not much writes to the person_rate table. Typical query would internaly need to get a rate for each day in month...

Maybe this SQL query for index/primary key ordinal with the new indexes on pg 9.2 can help somehow?

Community
  • 1
  • 1
vlastik
  • 13
  • 2

4 Answers4

0

Use this simple query:

SELECT person_id, date_from, rate FROM person_rate where date_from in (SELECT MAX(date_from) FROM person_rate WHERE date_from <= 'provided Date' and person_id = provided_id) 

To get all Person rates use.

SELECT a.person_id, a.date_from, a.rate FROM person_rate a JOIN (SELECT person_id, MAX(date_from) as date_from FROM person_rate where date_from <= 'provided Date') b ON(a.date_from = b.date_from and a.person_id = b.person_id)
sharafjaffri
  • 2,134
  • 3
  • 30
  • 47
0

you can use LEAD to generate the to_date column

with scd_table (
    select  a.*  , 
            lead(from_date,1,to_date('31/12/9999','dd/mm/yyyy')) over (partition by a order by from_date asc) as to_date
    from    YOUR_TABLE a
)
select  * 
from    scd_table
where   :d >= from_date
and     :d < to_date

(this is oracle syntax but lead is ANSI standard)

i think you should reconsider your design - look for slowly change dimension tables. there allot of articles about other designs. the way you did it - you would always fetch the next from_date in order to get just one particular instance of a client so that a bit heavy on selects.

haki
  • 9,389
  • 15
  • 62
  • 110
  • Thanks for that "slowly change dimension tables" it seems to be what I need, similar to IDEA I explained. Nonoverlapping constraint probably could be implemented using insert trigger. – vlastik Apr 09 '13 at 12:27
  • triggers are poor design. you need to ask your self one important question - is your system making more DML or selects ? if your system makes more DML you might want to give up `to_date` because each update/insert will require two updates (current row , and the to_date of the previous one). if your system does more selects - you might want to consider paying the penalty during insert in order to ease select's. i'll give you a rule of thumb - *if you have triggers, synonym or too many views - you are doing some thing wrong*. – haki Apr 10 '13 at 07:44
  • There will be almost no update/insert in compare to selects. So the two updates is definitely OK. I think about triggers probably only for checking consistency constraints, e.g. not to allow insert [from, to] interval which overlaps another one. – vlastik Apr 12 '13 at 17:06
  • i would do it in the application during the insert process. triggers should really be used only for patching and maybe a log-on trigger (but this is only me). you got an undocumented function called `overlaps`. i'm not suggesting you use it but read about it because finding overlapping can be trickier then you might think (talking from experience). – haki Apr 13 '13 at 06:39
0

SQL Fiddle

select distinct on (person_id) person_id, date_from, rate
from person_rate
where date_from <= '2012-03-01'
order by person_id, date_from desc

If (person_id, date_from) is unique then create that index:

create table person_rate (
    id serial primary key,
    person_id int,
    date_from date,
    rate float,
    unique (person_id, date_from)
);

If that is already a production table alter it:

alter table person_rate add 
constraint constraint_name unique (person_id, date_from);

Do not forget to run analyze person_rate after that. It will only use the index if the right conditions are met. That includes a big enough table.

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
0
SELECT person_id, rate FROM person_rate WHERE date_from <= '2012-05-02' ORDER BY date_from DESC LIMIT 0,1;

Where second digit of LIMIT is max person_id you want to use (for example LIMIT 0,5 for first 5 person_id) and date goes here: date_from <= '2012-05-02'

bardo
  • 106
  • 1
  • 1