7

I have an events table with two columns eventkey (unique, primary-key) and createtime, which stores the creation time of the event as the number of milliseconds since Jan 1 1970 in a NUMBER column.

I would like to create a "histogram" or frequency distribution that shows me how many events were created in each hour of the past week.

Is this the best way to write such a query in Oracle, using the width_bucket() function? Is it possible to derive the number of rows that fall into each bucket using one of the other Oracle analytic functions rather than using width_bucket to determine what bucket number each row belongs to and doing a count(*) over that?

-- 1305504000000 = 5/16/2011 12:00am GMT
-- 1306108800000 = 5/23/2011 12:00am GMT
select 
timestamp '1970-01-01 00:00:00' + numtodsinterval((1305504000000/1000 + (bucket * 60 * 60)), 'second') period_start,
numevents
from (
  select bucket, count(*) as events from (
    select eventkey, createtime, 
    width_bucket(createtime, 1305504000000, 1306108800000, 24 * 7) bucket
    from events 
    where createtime between 1305504000000 and 1306108800000
  ) group by bucket
) 
order by period_start
matt b
  • 138,234
  • 66
  • 282
  • 345

4 Answers4

11

If your createtime were a date column, this would be trivial:

SELECT TO_CHAR(CREATE_TIME, 'DAY:HH24'), COUNT(*) 
  FROM EVENTS
 GROUP BY TO_CHAR(CREATE_TIME, 'DAY:HH24');

As it is, casting the createtime column isn't too hard:

select TO_CHAR( 
         TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000), 
         'DAY:HH24') AS BUCKET, COUNT(*)
   FROM EVENTS
  WHERE createtime between 1305504000000 and 1306108800000
 group by TO_CHAR( 
         TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000), 
         'DAY:HH24') 
 order by 1

If, alternatively, you're looking for the fencepost values (for example, where do I go from the first decile (0-10%) to the next (11-20%), you'd do something like:

select min(createtime) over (partition by decile) as decile_start,
       max(createtime) over (partition by decile) as decile_end,
       decile
  from (select createtime, 
               ntile (10) over (order by createtime asc) as decile
          from events
         where createtime between 1305504000000 and 1306108800000
       )
Matthieu Cormier
  • 2,185
  • 1
  • 21
  • 32
Adam Musch
  • 13,286
  • 2
  • 28
  • 32
  • this works well, thanks. Not sure why I didn't think of simply truncating the dates in the first place, I think I got so caught up in figuring out how to parse and cast this odd "date" format – matt b Jun 01 '11 at 14:26
  • 1
    Is there anyway to maintain rows for create_times that have a zero count? – Brett VanderVeen Oct 29 '14 at 15:59
3

I'm unfamiliar with Oracle's date functions, but I'm pretty certain there's an equivalent way of writing this Postgres statement:

select date_trunc('hour', stamp), count(*)
from your_data
group by date_trunc('hour', stamp)
order by date_trunc('hour', stamp)
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
1

Pretty much the same response as Adam, but I would prefer to keep the period_start as a time field so it is easier to filter further if needed:

with
events as
(
    select rownum eventkey, round(dbms_random.value(1305504000000, 1306108800000)) createtime
    from dual
    connect by level <= 1000 
)
select
    trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH') period_start,
    count(*) numevents
from
    events
where
    createtime between 1305504000000 and 1306108800000
group by
    trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH')
order by
    period_start
Craig
  • 5,740
  • 21
  • 30
  • Can you explain the purpose of the `with events as ()` portion, and why you are selecting random values? I'm not very familiar with Oracle syntax – matt b Jun 01 '11 at 14:06
  • sorry... Since I don't have your table of data to run the query against, I am generating random data to emulate what might be in your table. The "with events" statement just allows me to alias that query as "events" so the rest of the query will match what you could use directly against your events table without making any changes. For your purposes, just delete everything above "select trunc(...." – Craig Jun 01 '11 at 14:23
  • ah thanks, I see how that would be useful in this type of answer :) – matt b Jun 01 '11 at 14:26
0

Using oracle provided function "WIDTH_BUCKET" to accumulate continuous or fine-discrete data. The following example shows a way to create a histogram with 5 buckets and gather "COLUMN_VALUE" from 510 to 520 (so each bucket gets values of range 2). WIDTH_BUCKET will create additional id=0 and num_buckets+1 buckets for values below min and above max.

SELECT "BUCKET_ID", count(*), 
CASE
    WHEN "BUCKET_ID"=0 THEN -1/0F
    ELSE 510+(520-510)/5*("BUCKET_ID"-1)
END "BUCKET_MIN",
CASE
    WHEN "BUCKET_ID"=5+1 THEN 1/0F
    ELSE 510+(520-510)/5*("BUCKET_ID")
END "BUCKET_MAX"
FROM 
(
    SELECT  "COLUMN_VALUE", 
            WIDTH_BUCKET("COLUMN_VALUE", 510, 520, 5) "BUCKET_ID"
    FROM "MY_TABLE"
)
group by "BUCKET_ID"
ORDER BY "BUCKET_ID";

Sample output

 BUCKET_ID   COUNT(*) BUCKET_MIN BUCKET_MAX
---------- ---------- ---------- ----------
         0         45       -Inf   5.1E+002
         1        220   5.1E+002  5.12E+002
         2        189  5.12E+002  5.14E+002
         3         43  5.14E+002  5.16E+002
         4          3  5.16E+002  5.18E+002

In my table, there's no 518-520, so bucket with id=5 is not shown. On the other hand, there's values below min (510), so there's a bucket with id=0, gathering -inf to 510 values.

hychou
  • 572
  • 5
  • 15