2

I have a data source file that I am loading in Redshift with a COPY command.

The file has a bunch of date columns with a two digit year format (I know, I am dealing with dinosaurs here).

Redshift recognizes the date format, but the problem is the file has values like:

06/01/79 

which actually means:

2079-06-01

however Redshift interprets it as:

1979-06-01

Is there a way to tell Redshift what is my threshold for the two digit date formats. For example values lower than 90 should be interpreted as 20XX.

The DATEFORMAT parameter in the COPY command does not have such an option.

Martin Taleski
  • 6,033
  • 10
  • 40
  • 78
  • 1
    As an alternative if you don't want to/can't edit the files beforehand, you can probably use an `UPDATE` after your `COPY` where you fix the year with `DATEADD(year,100,timestamp)`. Leverage a staging table if necessary. – Nicolas Feb 27 '16 at 21:03
  • I think an update on a staging table is the best solution. The files are too big to process them in code. – Martin Taleski Feb 28 '16 at 20:57

1 Answers1

0
-- Begin transaction
BEGIN TRANS;
--  Create a temp table
CREATE TEMP TABLE my_temp (dtm_str CHAR(8));
-- Load your data into the temp table
COPY my_temp FROM s3://my_bucket … ;
-- Insert your data into the final table
INSERT INTO final_table
-- Grab the first 6 chars and concatenate to the following
SELECT CAST(LEFT(dtm_str,6)||
-- Convert the last 2 chars to and in and compare to your threshold
       CASE WHEN CAST(RIGHT(dtm_str,2) AS INT) < 85
-- Add either 1900 or 2000 to the INT, convert to CHAR
            THEN CAST(CAST(RIGHT(dtm_str,2) AS INT) + 2000 AS CHAR(4))
       ELSE CAST(CAST(RIGHT(dtm_str,2) AS INT) + 1900 AS CHAR(4)) END
-- Convert the final CHAR to a DATE
       AS DATE) new_dtm
FROM my_temp;
COMMIT;
Joe Harris
  • 13,671
  • 4
  • 47
  • 54