3

In had a rather large error in my previous question

select earliest date from multiple rows

The answer by horse_with_no_name returns a perfect result, and I am hugely appreciative, however I got my own initial question wrong so I really apologise; if you look at the table below;

circuit_uid  |customer_name     |rack_location  |reading_date   | reading_time | amps | volts  |  kw  | kwh | kva  |  pf  |  key 
--------------------------------------------------------------------------------------------------------------------------------------
cu1.cb1.r1    | Customer 1       | 12.01.a1      | 2012-01-02   | 00:01:01     | 4.51 | 229.32 | 1.03 |  87 | 1.03 | 0.85 |    15
cu1.cb1.r1    | Customer 1       | 12.01.a1      | 2012-01-02   | 01:01:01     | 4.18 | 230.3  | 0.96 |  90 | 0.96 | 0.84 |    16
cu1.cb1.r2    | Customer 1       | 12.01.a1      | 2012-01-02   | 00:01:01     | 4.51 | 229.32 | 1.03 |  21 | 1.03 | 0.85 |    15
cu1.cb1.r2    | Customer 1       | 12.01.a1      | 2012-01-02   | 01:01:01     | 4.18 | 230.3  | 0.96 |  23 | 0.96 | 0.84 |    16
cu1.cb1.s2    | Customer 2       | 10.01.a1      | 2012-01-02   | 00:01:01     | 7.34 | 228.14 | 1.67 | 179 | 1.67 | 0.88 | 24009
cu1.cb1.s2    | Customer 2       | 10.01.a1      | 2012-01-02   | 01:01:01     | 9.07 |  228.4 | 2.07 | 182 | 2.07 | 0.85 | 24010
cu1.cb1.s3    | Customer 2       | 10.01.a1      | 2012-01-02   | 00:01:01     | 7.34 | 228.14 | 1.67 | 121 | 1.67 | 0.88 | 24009
cu1.cb1.s3    | Customer 2       | 10.01.a1      | 2012-01-02   | 01:01:01     | 9.07 |  228.4 | 2.07 | 124 | 2.07 | 0.85 | 24010
cu1.cb1.r1    | Customer 3       | 01.01.a1      | 2012-01-02   | 00:01:01     | 7.32 | 229.01 | 1.68 | 223 | 1.68 | 0.89 | 48003 
cu1.cb1.r1    | Customer 3       | 01.01.a1      | 2012-01-02   | 01:01:01     | 6.61 | 228.29 | 1.51 | 226 | 1.51 | 0.88 | 48004
cu1.cb1.r4    | Customer 3       | 01.01.a1      | 2012-01-02   | 00:01:01     | 7.32 | 229.01 | 1.68 | 215 | 1.68 | 0.89 | 48003 
cu1.cb1.r4    | Customer 3       | 01.01.a1      | 2012-01-02   | 01:01:01     | 6.61 | 228.29 | 1.51 | 217 | 1.51 | 0.88 | 48004

As you can see each customer now has multiple circuits. So the result would now be the sum of each of the earliest kwh readings for each circuit per customer, so the result in this table would be;

customer_name | kwh(sum)
--------------+-----------
customer 1    | 108      (the result of 87 + 21)  
customer 2    | 300      (the result of 179 + 121)  
customer 3    | 438      (the result of 223 + 215)   

There will be more than 2 circuits per customer and the readings can happen at varying times, hence the need for the 'earliest' reading.

Would anybody have any suggestions for the revised question?

PostgreSQL 8.4 on CentOs/Redhat.

Community
  • 1
  • 1
Alan Ennis
  • 145
  • 1
  • 3
  • 10

2 Answers2

2
SELECT customer_name, sum(kwh) AS kwh_total
FROM  (
    SELECT DISTINCT ON (customer_name, circuit_uid)
           customer_name, circuit_uid, kwh
    FROM   readings
    WHERE  reading_date = '2012-01-02'::date
    ORDER  BY customer_name, circuit_uid, reading_time
    ) x
GROUP  BY 1

Same as before, just pick the earliest per (customer_name, circuit_uid).
Then sum per customer_name.

Index

A multi-column index like the following will make this very fast:

CREATE INDEX readings_multi_idx
ON readings(reading_date, customer_name, circuit_uid, reading_time);
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • @AlanEnnis: Can you run a quick test with [`EXPLAIN ANALYZE`](http://www.postgresql.org/docs/current/interactive/sql-explain.html) which is faster? Would be interesting. – Erwin Brandstetter Nov 17 '12 at 19:14
  • erwin your result in Total runtime: 21.058 ms (8 rows) horse's result in Total runtime: 20.623 ms (10 rows) Total number of rows for the date in question is 432 rows. You guys are both awesome, thank you. – Alan Ennis Nov 17 '12 at 19:17
  • @AlanEnnis: Thanks for the feedback. Probably, the `DISTINCT ON` version has a bit more sort overhead that's not needed in this case. For a simple case where ordered output is needed this variant is regularly faster. But for a small set like in your case it is really irrelevant - except if you call that lots of times. Also: with the index I added to my answer this will be faster than anything. – Erwin Brandstetter Nov 17 '12 at 19:24
  • Erwin, with index created results are Total runtime: 9.325 ms (10 rows) for your query and Total runtime: 9.523 ms (12 rows) for horse's query. In production there will be 20 or 30 times more rows per single date so it will make a difference. Thank you for the index also, i will file that away for the production database. – Alan Ennis Nov 17 '12 at 20:05
1

This is an extension to your original question:

select customer_name,
       sum(kwh)
from (
   select customer_name,
          kwh,
          reading_time,
          reading_date,
          row_number() over (partition by customer_name, circuit_uid order by reading_time) as rn
   from readings
   where reading_date = date '2012-01-02'
) t
where rn = 1
group by customer_name

Note the new sum() in the outer query and the changed partition by definition in the inner query (compared to your previous question) which calculates the first reading for each circuit_uid now (instead of the first for each customer).