2

I found 2 solutions:

  1. Using AWS Data Pipeline to schedule the query (Unload) and use 's3://reporting-team-bucket/importfiles/test_123-#{format(@scheduledStartTime,'YYYY-MM-dd-HH')}.csv'
  2. writing an MV command to rename the file on the s3 bucket

Is there a way to give a file's the current date by only using Redshift, with no other services?

Here is my code so far:

unload
(
'select * from table'
)
to 's3://bucket/unload_test/test_123_{CurrentDate}.gz'
ACCESS_KEY_ID '12345678910'
SECRET_ACCESS_KEY '10987654321'
GZIP
PARALLEL off; 

Just need to get CurrentDate to be 202106 for example.

Thanks!

eshirvana
  • 23,227
  • 3
  • 22
  • 38
Itay
  • 100
  • 7
  • 1
    Hey — you can try normal string concatenation probably: `'s3://bucket/unload_test/test_123_' || TRUNC(GETDATE()) || 'gz'` – Aviv Goldgeier Jun 16 '21 at 15:01
  • Unfortunately it doesn't work (i wish it did). you cannot exit the ' ' when you type the file name. If you leave it inside it, the name of the file will literally be "test_123_'|| trunc(GETDATE()) ||'gz". – Itay Jun 16 '21 at 15:08
  • What tool is issuing the SQL? The simplest method is often to have the issuer give SQL with the filename already containing the date. – Bill Weiner Jun 16 '21 at 15:38
  • @BillWeiner Not sure if this is what you asked, but I am using DBeaver to query the SQL. Can you elaborate on what you mean by "have the issuer give SQL with the filename already containing the date"? Thanks! – Itay Jun 17 '21 at 06:56
  • DBeaver (like most benches) support client side scripting and variables (but I'm not a DBeaver user). You can set bench variables and use them in queries - like: https://github.com/dbeaver/dbeaver/issues/2563 – Bill Weiner Jun 17 '21 at 14:16

2 Answers2

1

I've never tried using UNLOAD in a transaction, but if it works, you could use a procedure.

0

Redshift unload gives an option to load the data in a by partition. Use **PARTITION BY(COLUMN_NAME)**. Here is an example

unload (' 

        SELECT   col1
               , col2
               , col3
               , current_date as partition_by_me
         FROM dummy
         
 '
 )
to 's3://mybucket/dummy/'
partition by(partition_by_me)
iam_role 'arn of IAM role'
kms_key_id 'arn of kms key'
encrypted
FORMAT AS PARQUET

In the above example, Added a dummy column partition_by_me as current_date. Used that in the unload command partition by(parition_by_me). Data in S3 lands in that specific partition.

S3 path would be : s3://mybucket/dummy/partition_by_me=2022-08-18/000.parquet Timestamp with zone does work with this.

*** Dummy column does not get exported to S3 file as an additional column, unless you want to include. Following clause need to used to include in the unloaded data set.

partition by(partition_by_me) INCLUDE

INCLUDE clause will include the column in exported data sets.

Sunil
  • 66
  • 4