1

I have temporal database and I wanted to obtain hourly average for time series data.

Sample data:

1     -1.64 2007-09-29 00:01:09
2     -1.76 2007-09-29 00:03:09
3     -1.83 2007-09-29 00:05:09
4     -1.86 2007-09-29 00:07:09
5     -1.94 2007-09-29 00:09:09
6     -1.87 2007-09-29 00:11:09
7     -1.87 2007-09-29 00:13:09
8     -1.80 2007-09-29 00:15:09
9     -1.64 2007-09-29 00:17:09
10    -1.60 2007-09-29 00:19:09
11    -1.90 2007-09-29 00:21:09
12    -2.08 2007-09-29 00:23:09
13    -1.94 2007-09-29 00:25:09
14    -2.12 2007-09-29 00:27:09
15    -1.87 2007-09-29 00:29:09
16    -2.18 2007-09-29 00:31:09
17    -1.98 2007-09-29 00:33:09
18    -1.73 2007-09-29 00:35:09
19    -1.84 2007-09-29 00:37:09
20    -2.04 2007-09-29 00:39:09
21    -1.86 2007-09-29 00:41:09
22    -1.94 2007-09-29 00:43:09
23    -1.77 2007-09-29 00:45:09
24    -1.78 2007-09-29 00:47:09
25    -1.50 2007-09-29 00:49:09
26    -1.46 2007-09-29 00:51:09
27    -1.72 2007-09-29 00:53:09
28    -1.67 2007-09-29 00:55:09
29    -1.56 2007-09-29 00:57:09
30    -1.69 2007-09-29 00:59:09
31    -1.97 2007-09-29 01:01:09
32    -1.79 2007-09-29 01:03:09
33    -1.79 2007-09-29 01:05:09
34    -1.84 2007-09-29 01:07:09
35    -1.91 2007-09-29 01:09:09
36    -1.87 2007-09-29 01:11:09
37    -1.98 2007-09-29 01:13:09
38    -1.83 2007-09-29 01:15:09
39    -1.88 2007-09-29 01:17:09
40    -1.88 2007-09-29 01:19:09
41    -1.78 2007-09-29 01:21:09
42    -1.78 2007-09-29 01:23:09
43    -1.66 2007-09-29 01:25:09
44    -1.70 2007-09-29 01:27:09
45    -1.46 2007-09-29 01:29:09
46    -1.36 2007-09-29 01:31:09
47    -1.40 2007-09-29 01:33:09
48    -1.34 2007-09-29 01:35:09
49    -1.34 2007-09-29 01:37:09
50    -1.30 2007-09-29 01:39:09
51    -1.36 2007-09-29 01:41:09
52    -1.40 2007-09-29 01:43:09
53    -1.43 2007-09-29 01:45:09
54    -1.38 2007-09-29 01:47:09
55    -1.40 2007-09-29 01:49:09
56    -1.42 2007-09-29 01:51:09
57    -1.47 2007-09-29 01:53:09
58    -1.66 2007-09-29 01:55:09
59    -1.84 2007-09-29 01:57:09
60    -1.92 2007-09-29 01:59:09
61    -1.88 2007-09-29 02:01:09
62    -2.11 2007-09-29 02:03:09
63    -1.91 2007-09-29 02:05:09
64    -2.04 2007-09-29 02:07:09
65    -1.94 2007-09-29 02:09:09
66    -1.92 2007-09-29 02:11:09
67    -1.80 2007-09-29 02:13:09
68    -1.74 2007-09-29 02:15:09
69    -1.74 2007-09-29 02:17:09
70    -1.76 2007-09-29 02:19:09
71    -1.74 2007-09-29 02:21:09
72    -1.80 2007-09-29 02:23:09
73    -1.80 2007-09-29 02:25:09
74    -1.80 2007-09-29 02:27:09
75    -1.82 2007-09-29 02:29:09
76    -1.90 2007-09-29 02:31:09
77    -1.93 2007-09-29 02:33:09
78    -2.06 2007-09-29 02:35:09
79    -2.08 2007-09-29 02:37:09
80    -1.95 2007-09-29 02:39:09
81    -1.98 2007-09-29 02:41:09
82    -2.32 2007-09-29 02:43:09
83    -1.86 2007-09-29 02:45:09
84    -1.97 2007-09-29 02:47:09
85    -1.64 2007-09-29 02:49:09
86    -2.00 2007-09-29 02:51:09
87    -1.48 2007-09-29 02:53:09
88    -1.74 2007-09-29 02:55:09
89    -1.85 2007-09-29 02:57:09
90    -1.82 2007-09-29 02:59:09
91    -1.82 2007-09-29 03:01:09
92    -1.92 2007-09-29 03:03:09
93    -1.80 2007-09-29 03:05:09
94    -1.54 2007-09-29 03:07:09
95    -1.36 2007-09-29 03:09:09
96    -1.50 2007-09-29 03:11:09
97    -1.59 2007-09-29 03:13:09
98    -1.60 2007-09-29 03:15:09
99    -1.58 2007-09-29 03:17:09
100   -1.81 2007-09-29 03:19:09
101   -2.16 2007-09-29 03:21:09
102   -1.97 2007-09-29 03:23:09
103   -1.94 2007-09-29 03:25:09
104   -2.29 2007-09-29 03:27:09
105   -2.46 2007-09-29 03:29:09
106   -2.42 2007-09-29 03:31:09
107   -2.34 2007-09-29 03:33:09
108   -2.38 2007-09-29 03:35:09
109   -2.44 2007-09-29 03:37:09
110   -2.28 2007-09-29 03:39:09
111   -2.24 2007-09-29 03:41:09
112   -2.26 2007-09-29 03:43:09

Aggregation should be performed based on following time intervals: HH = (HH-1):41 - HH:40 Example: 13 = observation period 12:41 to 13:40

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
A.Amidi
  • 2,502
  • 5
  • 25
  • 37
  • Is it moving average or just staic eacgh over average?. :) – bonCodigo Jan 06 '13 at 21:25
  • @bonCodigo It is not moving average. Just static. Hourly mean of 9 o'clock is : average of all records among 8:41 to 9:40. Moreover, it is possible for a one time interval there is no any information on minutes 40 or 41, so it could average the data over 59 or 61 minute instead of exact 1 hour. – A.Amidi Jan 06 '13 at 21:27

2 Answers2

3

Should work like this:

SELECT  date_trunc('hour', ts + interval '20 min') AS h
       ,avg(val) as avg_val
FROM    t
GROUP   BY 1
ORDER   BY 1;

I add 20 minutes before I granulate the time with date_trunc() and then aggregate by it.
Note that the border time 08:40 ends up in the average for 9 o'clock.

Regular hourly grid

... including hours without any rows in the base table:

SELECT *
FROM   generate_series('2007-09-28 22:00'  -- first hour
                      ,'2007-09-29 05:00'  -- last hour
                      ,interval '1h') AS h
LEFT   JOIN (
    SELECT  date_trunc('hour', ts + interval '20 min') AS h
           ,avg(val) as avg_val
    FROM    t
    GROUP   BY 1
    ORDER   BY 1
    ) x USING (h);

Use generate_series() and a LEFT JOIN for that.

-> Updated the sqlfiddle.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • As I mentioned, for some time intervals minute of 41 or 40 doesn't exist as 8 o'clock in the morning. How the code handle this? – A.Amidi Jan 06 '13 at 21:45
  • @Amidi: If you mean, there are no rows at all for the time period, then there will be no row in the result. If you need a row for every hour, left join to a set of temporal data. I'll add an example with generate_series((). – Erwin Brandstetter Jan 06 '13 at 21:51
  • No, I wanted to say, for example for 8 o'clock hourly average should be performed among all the records between 7:41 and 8:40 but at 8:40 there is no data and instead there are data on 8:39 and 8:41. I wanted to know which of them will be chosen? – A.Amidi Jan 06 '13 at 21:56
  • `8:39` ends up in the 8 o'clock average, `8:41` ends up in the 9 o'clock average. Just try it. My query doesn't care if any particular timestamp exists or not. – Erwin Brandstetter Jan 06 '13 at 22:01
  • I have one more question. I one to create a subset from my temporal database for records among: '2007-09-28 22:00' and '2007-09-29 59:00'. I am using the code below since it cannot adress aforementioned problem? *select ambtemp,dt from s_2 WHERE dt::timestamptz BETWEEN DATE '2007-09-29' AND DATE '2007-09-30' and extract(hour FROM dt::timestamptz) BETWEEN 0 AND 24* – A.Amidi Jan 06 '13 at 22:32
  • @Amidi: Please open a new question for this, and add your exact table definition and explain why you cast do `timestamp with time zone`. Comments are **not** for new questions. – Erwin Brandstetter Jan 06 '13 at 22:53
-2

My system is very slow, i can't give you a sql fiddle therefore. Try the following. Assuming you just need to get statick hourly average.

SELECT datecol, DATEPART(hour,timescolumn) as hourcol, AVG(valuecol)
FROM Yourtable
GROUP BY hourcol, dateCol;

Edit:

Credit to @Erwin. I used his sqlfiddle to run my query. There were couple of syntax errors to be made. And it is only doing the hourly averge.

Postegres query:

SELECT DATE(ts) AS DT, 
DATE_PART('hour', ts) as hourcol, avg(val)
FROM t
GROUP BY hourcol, DT
ORDER BY hourcol
;

Results:

DT                              HOURCOL     AVG
September, 29 2007 00:00:00+0000    0           -1.814666666667
September, 29 2007 00:00:00+0000    1           -1.638
September, 29 2007 00:00:00+0000    2           -1.879333333333
September, 29 2007 00:00:00+0000    3           -1.986363636364
bonCodigo
  • 14,268
  • 1
  • 48
  • 91
  • Please add a comment for downvote. I understand my query is in `SQL Server.` So OP needs to convert it to `PostGres`. – bonCodigo Jan 06 '13 at 21:47
  • I didn't downvote, but some possible explanation: This doesn't address the special difficulty of the question to aggregate values between *12:41 to 13:40*. It also fails completely since you cannot include the un-aggregated column `datecol` in the `SELECT` list. Barring that, it would still fail with values spanning multiple days, since the result of `datepart()` is identical for different days. None of this is PostgreSQL specific. – Erwin Brandstetter Jan 06 '13 at 21:48
  • @ErwinBrandstetter I guess he just gave an example of how an hour should be calculated. It doesn't give the impression of a particular start time. So I treated an hour as `1 to 24` norm.. Here is a fairly similar [question by OP](http://stackoverflow.com/questions/13818524/moving-average-based-on-timestamps-in-postgresql)... – bonCodigo Jan 06 '13 at 21:52
  • @bonCodigo, The question you mentioned is moving average. – A.Amidi Jan 06 '13 at 21:58
  • @bonCodigo, The code result sth different than you presented?! – A.Amidi Jan 06 '13 at 22:15
  • @Amidi Do you see my EDIT? It says "only doing the hourly averge." So IT DOESN'T MEAN it give you the `41:40` answer. But it merely gives you an idea of how to do an `hourly average`. I said the same in my above comment as well. **Infact I did have to ask you for a better clarification as per my first comment to you coz your question didn't say it all! ** Getting a downvote when I have already mentioned the syntax is from SQL Server is.....hmm. – bonCodigo Jan 06 '13 at 22:22
  • @Amidi in my answer you were really comparing Erwin's answer to mine? Goodness. You can clearly see the difference between the two query syntaxes no? – bonCodigo Jan 07 '13 at 20:03