3

My Airflow application is running in AWS EC2 instance which has IAM role as well. Currently I am creating Airflow S3 connection using hardcoded access and secret key. But I want my application to pickup this AWS credentials from this instance itself.

How to achieve this?

Achaius
  • 5,904
  • 21
  • 65
  • 122
  • While it isn't directly related to your question, also see [Create Connections in Airflow operator at runtime](https://stackoverflow.com/questions/53740885/create-and-use-connections-in-airflow-operator-at-runtime) – y2k-shubham Jan 22 '19 at 20:02

2 Answers2

9

We have a similar setup, our Airflow instance run inside containers deployed inside an EC2 machine. We set up the policies to access S3 on the EC2 machine instance profile. You don't need to pick up the credentials in the EC2 machine, because the machine has an instance profile that should have all the permissions that you need. From the Airflow side, we only use aws_default connection, in the extra parameter we only setup the default region, but there aren't any credentials. Here a details article about Intance Profiles: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html

nicor88
  • 1,398
  • 10
  • 13
  • Can you please show me airflow.cfg setting for the above setup. I have a similar setting but for some reason it's not working – Pankaj_Pandav Jun 30 '20 at 08:17
  • The airflow.cfg regarding this setup didn't change. What is important to highlight is when you use the AwsHook is we pass `aws_default`, but the connections inside Airflow have a default values for that. Each boto3 call done by airflow to AWS services will use the EC2 instance profile policies. Can you tell me the error that you are getting? Maybe is only a missing permission in the policy. – nicor88 Jun 30 '20 at 08:28
  • Hi, Currently, few jobs are able to put the logs on s3 bucket after setting connection id to aws_default but now celery executor is not able to put the logs on s3 even when airflow.cfg has all above mentioned variables defined. To give a bit of background, DAG has first step to raise AWS EMR and a worker (celery executor) these two steps are putting the logs onto s3 bucket but when jobs run on EMR, those logs are not put onto s3. Another issue is I can not see the startEMR and startWorker logs from s3. It shows me logs from local. Whats could be the solution for this? – Pankaj_Pandav Jul 01 '20 at 01:17
  • I was able to resolve this with your suggestions. Thanks. – Pankaj_Pandav Jul 12 '20 at 23:24
5

The question is answered but for future reference, it is possible to do it without relying on aws_default and just doing it via Environment Variables. Here is an example to write logs to s3 using an AWS connection to benefit form IAM:

AIRFLOW_CONN_AWS_LOG="aws://"
AIRFLOW__CORE__REMOTE_LOG_CONN_ID=aws_log
AIRFLOW__CORE__REMOTE_LOGGING=true
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER="s3://path to bucket"
Amin
  • 763
  • 7
  • 22