-1

Here is a dummy table to describe what I am trying to do:

ID_1      | ID_2     | ID_3       | Day   | Energy_Costs  |
----------+----------+------------+-------+---------------+
State_1   | County_1 | Building_1 |  1    | 48.8          |
State_1   | County_1 | Building_1 |  2    | 31.3          |
State_1   | County_1 | Building_2 |  1    | 20.5          |
State_1   | County_2 | Building_1 |  1    |  1.9          |
State_2   | County_1 | Building_1 |  1    |  6.6          |
State_2   | County_2 | Building_2 |  1    | 38.2          |
State_2   | County_2 | Building_2 |  2    | 12.0          |

In the table above, a unique record (a Building in this case), requires 3 columns (ID_1, ID_2, ID_3). I would like to return a table with the first row of a given day for a building.

Here's how the query looks in my head:

SELECT FIRST(ID_1), FIRST(ID_2), FIRST(ID_3), FIRST(Energy_Costs), FIRST(DAY)
FROM buildings_db
GROUP BY ID_1, ID_2, ID_3
ORDER BY DAY

This would return:

ID_1      | ID_2     | ID_3       | Day   | Energy_Costs  |
----------+----------+------------+-------+---------------+
State_1   | County_1 | Building_1 |  1    | 48.8          |
State_1   | County_1 | Building_2 |  1    | 20.5          |
State_1   | County_2 | Building_1 |  1    |  1.9          |
State_2   | County_1 | Building_1 |  1    |  6.6          |
State_2   | County_2 | Building_1 |  1    | 38.2          |

I've seen other questions asking something similar but they typically don't have multiple columns defining a group. I'm very new to SQL so translating them to my example is proving unsuccessful; if any of you can provide an explanation for why your solution works that would be very helpful.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
MattM
  • 317
  • 4
  • 12

2 Answers2

1

You can use DISTINCT ON (). It works for any number of columns to define a group:

SELECT DISTINCT ON (ID_1, ID_2, ID_3)
       ID_1, ID_2, ID_3, DAY, Energy_Costs
FROM   buildings_db
ORDER  BY ID_1, ID_2, ID_3, DAY, Energy_Costs;

This returns the first row for each distinct combination of (ID_1, ID_2, ID_3), first being defined by additional ORDER BY expressions.

To get ...

the first row of a given day for a building:

SELECT DISTINCT ON (ID_1, ID_2, ID_3)
       ID_1, ID_2, ID_3, DAY, Energy_Costs
FROM   buildings_db
WHERE  DAY = 1  -- given day
ORDER  BY ID_1, ID_2, ID_3, Energy_Costs

Detailed explanation:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Erwin, Your response is extremely helpful! You've correctly answered the question, but I'm realizing I didn't create an accurate enough representation. One more flavor: instead of "Day" my data is time series. I want to return the earliest instance of time, which I cannot reliably state as one. Is there a dynamic way of doing what you did in your example? Something like: "Where Day = minimum(Day)" – MattM Feb 12 '18 at 18:46
  • @MattM: If you realize you have another, different question, just start a new question (with clear details). Comments are not the place. – Erwin Brandstetter Feb 12 '18 at 20:35
0

You may use a subquery and JOIN for this

select b.ID_1, b.ID_2, b.ID_3, b.Energy_Costs, b.DAY
from buildings_db b
join
(
  select ID_1, ID_2, ID_3, min(day) min_day
  from buildings_db 
  group by ID_1, ID_2, ID_3
) t on b.id_1 = t.id_1 and
       b.id_2 = t.id_2 and
       b.id_2 = t.id_2 and
       b.day = t.min_day
Radim Bača
  • 10,646
  • 1
  • 19
  • 33