2

Problem

I've an Airflow pipeline that I'd like to run locally, which does the following:

  1. Downloads tables from Redshift to a S3 bucket (basically RedshiftToS3Operator)
  2. Copies the tables from the S3 bucket to another Redshift (basically S3ToRedshiftOperator)

Question

Would it be possible to use QA Redshift, then copy those files to locally mocked S3 with LocalStack and finally use a mocked PosgreSQL for the second Redshift? Would this approach have more pros than cons?

Note: I'm not thinking of mocking Redshift locally with LocalStack, since apparently

the redshift service only mocks the redshift management endpoints (create cluster, etc...) and not the actual query engine. SO 1st comment

andres
  • 73
  • 6

1 Answers1

1

Check out the redshift-fake-driver project, which allows you to simulate Redshift on top of PostgreSQL by translating and implementing certain Redshift-specific commands on the fly in the JDBC database driver itself, mainly UNLOAD and COPY - the commands to load Redshift tables to and from S3 (I use LocalStack's S3).

You can interface with the JDBC driver from Python using the JayDeBeApi Python package, this is how I use it. It works quite nicely, simulating enough of Redshift's features locally, and combined with LocalStack S3, you can build fully local Redshift and S3 pipelines.

schiavuzzi
  • 764
  • 4
  • 9