4

I have a python script that connects to Redshift, executes a series of SQL commands, and generates a new derived table.

But for the life of me, I can't figure out a way to have it automatically run every day.

I've tried AWS Data Pipeline but my shell script won't run the first copy statement. I can't get Lambda or Glue to work because my company's IAM policies are restrictive. Airflow seems like overkill to just run a single python script daily.

Any suggestions for services to look into?

ScottieB
  • 3,958
  • 6
  • 42
  • 60
  • 5
    Other than a cron job? – Ignacio Vazquez-Abrams Nov 16 '17 at 06:12
  • I have a batch job to trigger the script which I have scheduled to run automatically daily. – Abhijeetk431 Nov 16 '17 at 06:18
  • 1
    Apply for an IAM policy change ;-) – jarnohenneman Nov 16 '17 at 07:00
  • 2
    Lambda is made for this kind of stuff. Talk to your bosses about changing the IAM Policy. It seems silly to use the wrong tool for the job and waste AWS resources and cash on an EC2 instance for this. – myron-semack Nov 16 '17 at 11:46
  • I like Data Pipeline b/c we have other SQL-based derived tables built there, so it's a central spot to monitor. Also I was following: http://themrmax.github.io/2015/08/24/A-Python-Script-on-AWS-Data-Pipeline.html But I'll try to make progress on Lambda, it seems more broadly useful anyway. – ScottieB Nov 16 '17 at 14:50

4 Answers4

7

Cron job?

00 12 * * * /home/scottie/bin/my_python_script.py

Run my_python_script.py at the top of the hour (0th minute), at noon, every day.

Jack Ryan
  • 1,287
  • 12
  • 26
1

I use a scheduled task on Windows. Either enter it using the GUI or the at command.

Mike Robins
  • 1,733
  • 10
  • 14
  • 2
    On a Mac. Plus what happens if your machine is off? – ScottieB Nov 16 '17 at 14:51
  • I gather that the task scheduler is able to [wake from sleep](https://www.howtogeek.com/119028/how-to-make-your-pc-wake-from-sleep-automatically/) mode. I know very little about Mac. – Mike Robins Nov 17 '17 at 00:21
1

If you are using AWS Glue or have some other reason to install a development endpoint, you can use Apache Zeppelin to run any code from any language (if you have the jar files) on a schedule based on a cron command. Here's the notebook I use to run Redshift nightly maintenance:

Redshift Maintenance in a Zeppelin notebook

1

use a cron job on an ec2 instance or set up a scheduled event to invoke your aws python lambda function http://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html

Moe
  • 2,672
  • 10
  • 22