I have a table with some statistical values from a process. The table has the following format:
CREATE TABLE data (
process integer NOT NULL,
time timestamp NOT NULL
first double precision,
last double precision,
first_time timestamp,
last_time timestamp
)
The data in this table is inserted every minute, and contains the aggregate value of the last minute. For example, for a process 1, we can have the following data:
+---------------------------------------------------------------------------------+
| process | time | first | last | first_time | last_time |
+---------------------------------------------------------------------------------+
| 1 | 2014-09-22 12:00:00 | 100 | 200 | 2014-09-22 12:00:00 | 2014-09-22 12:00:59 |
| 1 | 2014-09-22 12:01:00 | 104 | 152 | 2014-09-22 12:01:00 | 2014-09-22 12:01:59 |
| 1 | 2014-09-22 12:02:00 | 141 | 155 | 2014-09-22 12:02:10 | 2014-09-22 12:02:59 |
| 1 | 2014-09-22 12:03:00 | 122 | 147 | 2014-09-22 12:03:00 | 2014-09-22 12:02:45 |
+---------------------------------------------------------------------------------+
As you can see in the third row, there are times when the first value is not the second 0 of the minute. In the last time this also happen (fourth row).
Using the first and last functions from this page, and the date_round function from this page, I want to select the first value of a given process in a 30 minute interval.
When I try the following two queries, both the result are correct.
SELECT
date_round(time, '30 min'::interval) AS "time",
first(first)
FROM
data
WHERE
process = 1
AND
time > '2014-09-20 00:00:00'
AND
time < '2014-09-22 18:00:00'
GROUP BY 1
ORDER BY 1
and
SELECT
date_round(time, '30 min'::interval) AS "time",
first(first) AS "value"
FROM (
SELECT
time,
first
FROM
data
WHERE
process = 1
AND
time > '2014-09-20 00:00:00'
AND
time < '2014-09-22 18:00:00'
ORDER BY 1
) A
GROUP BY 1
ORDER BY 1
My question is: Why the first query works in this case? Is postgres sort the data before apply the group by clause?
My doubt is because the "first" function returns the first value of the two passed arguments. If the data is not sorted, then the first call in the "first" aggregate function can contain a value that is not in the correct order, resulting in an invalid value, right?
Can I use the first query safely or should I use the second query?