3

I am creating a table which will have 2 columns:

  1. Day_time (time from 1978-01-01 00:00:00 Sunday, till 1978-01-07 23:59:00.0 Saturday, Granularity: Minute)
  2. Time_id (a unique id for each minute), to be populated

I have column one populated. I want to populate column two.

How I am doing it right now:

EXTRACT(dayofweek FROM day_time) * 10000 + DATEDIFF('minutes', TRUNC(day_time), day_time)

I basically want a function where I pass any date and it tells me where I am in a week. So, I need a function, just like the function above. Just more optimized, where I give a date and get a unique ID. The unique ID should repeat weekly.

Example: ID for Jan 1, 2015 00:00:00 will be same as Jan 8, 2015 00:00:00.

Why 1978-01-01? cuz it starts from a Sunday.    
Why 10,000? cuz the number of minutes in a day are in four digits.
Deep Kalra
  • 1,418
  • 9
  • 27
  • @wingedpanther, thanks for the reply soldier. I am sorry, I was not clear. I want a function like the one I have written, as I want to calculate the same Id on the fly as well. I know recursive CTE's bro and I don't need a CTE to generate the Unique key, I can just Identity. Am I right, or am I right? – Deep Kalra Jan 22 '16 at 11:16
  • Does Amazon Redshift support *any* of these PostgreSQL features: CTEs, recursive CTEs, sequences, generate_series()? – Mike Sherrill 'Cat Recall' Jan 22 '16 at 11:18
  • Try this `insert into tbl select row_number() over() id,dtm from( select generate_series('1978-01-01 00:00:00'::timestamp,'1978-01-02 23:59:00.0'::timestamp,'1min'::interval) dtm )t`... Initially I just misunderstood your req. – Vivek S. Jan 22 '16 at 11:20
  • Error: `function generate_series(timestamp without time zone, timestamp without time zone, interval) does not exist;` but `generate_series(1,30)` is working fine. I am new to this, can you see this: http://stackoverflow.com/questions/17282276/using-sql-function-generate-series-in-redshift – Deep Kalra Jan 22 '16 at 11:24
  • [This](http://sqlfiddle.com/#!15/f2674/4) is it.... – Vivek S. Jan 22 '16 at 11:25
  • Anyways, how will I get the ID from the date on the fly using this? I think I won't be able to. – Deep Kalra Jan 22 '16 at 11:27
  • I've used `row_number() over()`, have you checked my fiddle?? – Vivek S. Jan 22 '16 at 11:30
  • May be http://stackoverflow.com/questions/22554398/generate-series-function-in-amazon-redshift could be a reason on your side – Vivek S. Jan 22 '16 at 11:30
  • @wingedpanther I really appreciate your help. I have update the question, please see the highlighted part. Sorry for the confusion. Obviously I have seen the fiddle. I guess I was not able to explain question properly. There we are generating the ids, I am saying I can do that using Identity column as well, right? I want a function. – Deep Kalra Jan 22 '16 at 11:33
  • @DeepanshuKalra Still I cannot understand your requirement well. IMO sample DDL,data and expected output would be much helpful – Vivek S. Jan 22 '16 at 11:37
  • @wingedpanther need a function, just like the function I have above. just more optimized, where I give a date and get a unique ID. The unique ID should repeat weekly. Which means: ID for `Jan 1, 2015 00:00:00` will be same as `Jan 8, 2015 00:00:00`. – Deep Kalra Jan 22 '16 at 11:46
  • a) If it repeats, it's not unique. b) The trunc() function takes a number, not a timestamp. Does your expression *really* work on Redshift? – Mike Sherrill 'Cat Recall' Jan 22 '16 at 11:52
  • Are you trying to calculate the number of minutes since midnight the previous Sunday? – Mike Sherrill 'Cat Recall' Jan 22 '16 at 12:03
  • @MikeSherrill'CatRecall' Yes my expression works and I am using it from more than a month now. And I want it to repeat, but after a week, it should be unique for a week. – Deep Kalra Jan 22 '16 at 12:28
  • What is wrong with the current function ? As you say, it works. If performance is an issue you might consider pre-calculating the values and storing them in a separate table and then `JOIN`-ing against that table to find the required `ID`. If your source-data already is "rounded" to the minute, then this would work very well (the datetime field should be the PK), otherwise you'll need to do the "rounding" inside the `JOIN` predicate and that's going to kill performance even more than the function you're already using I fear. – deroby Jan 25 '16 at 08:17
  • PS: You could try adding a start/end column approach in the pre-calculated table and then `JOIN` with `BETWEEN` but that's probably not very fast either... – deroby Jan 25 '16 at 08:18
  • I was not able to think of any approach faster than the one I mentioned in question. I have implemented the system using this. But will love to know how I can optimize it. – Deep Kalra Jan 26 '16 at 12:47

1 Answers1

0

You can do it all in one fell swoop, without needing to extract the date separately:

SELECT DATEDIFF('minutes', date_trunc('week',day_time), day_time) which I'd expect to be marginally faster.

Another approach that I'd expect to be significantly faster would be converting the timestamp to epoch, dividing by 60 to get minutes from epoch and then taking the value modulus of 10,080 (for 60 * 24 * 7 minutes in a week).

SELECT (extract(epoch from day_time) / 60) % 10080

If you don't care about the size of the weekly index, you could also do:

SELECT (extract(epoch from day_time)) % 604800 and skip the division step altogether, which should make it faster still.

George S
  • 2,041
  • 9
  • 13