3

How can I get the region in which the current Glue job is executing?


When the Glue job starts executing, I see the output

Detected region eu-central-1.

In AWS Lambda, I can use the following lines to fetch the current region:

import os
region = os.environ['AWS_REGION']

However, it seems like the AWS_REGION environment variable is not present in Glue and therefore a KeyError is raised:

KeyError: 'AWS_REGION'


The reason why I need the region is I am trying to fetch all databases and tables as described in this question and I do not want to hard code the region when creating the boto client.

matsev
  • 32,104
  • 16
  • 121
  • 156

3 Answers3

4

One option is to pass the AWS_REGION as a job parameter. For example, if you trigger the job from Lambda:

import os

response = client.start_job_run(
    JobName = 'a_job_name',
    Arguments = {'--AWS_REGION': os.environ['AWS_REGION'] } 
)

Alternatively, if you define your jobs using the AWS::Glue::Job CloudFormation resource:

GlueJob:
  Type: AWS::Glue::Job
  Properties:
    Role: !Ref GlueRole
    DefaultArguments:
      "--AWS_REGION": !Sub "${AWS::Region}"
    Command:
      ScriptLocation: !Sub s3://${GlueScriptBucket}/glue-job.py
      Name: glueetl

Then you can extract the AWS_REGION parameter in your job code using getResolvedOptions:

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['AWS_REGION'])
print('region', args['AWS_REGION'])
matsev
  • 32,104
  • 16
  • 121
  • 156
0

Use os.environ['AWS_DEFAULT_REGION'] instead.

Leaving this here for new visitors.

Prashanth kumar
  • 949
  • 3
  • 10
  • 32
  • Nope. I'm getting a `KeyError: 'AWS_DEFAULT_REGION'`. Any other suggestions? – Jari Turkia May 13 '20 at 06:16
  • when I tried printing `os.environ` I got this response from the glue job, `environ({ "PATH": "", "HOSTNAME": "", "USE_PROXY": "", "AWS_DEFAULT_REGION": "us-east-1", "GLUE_PYTHON_VERSION": "3", "ERROR_FILE_NAME_LOCATION": "", "LANG": "", "GPG_KEY": "", "PYTHON_VERSION": "", "PYTHON_PIP_VERSION": "", "PYTHON_GET_PIP_URL": "", "PYTHON_GET_PIP_SHA256": "", "PYTHONPATH": ":", "GLUE_INSTALLATION": "", "HOME": "/" })` – Prashanth kumar May 13 '20 at 06:40
  • I am using **Python shell** as the glue job type, if you are using "Spark" or "Spark Streaming" I suggest just run a small script as simple as `import os print(os.environ)` that should give you a list of all environment vars – Prashanth kumar May 13 '20 at 06:42
  • Ok. Being new to Glue/Spark, I didn't realize the difference. Indeed, I'm running a Spark job, not Python job. – Jari Turkia May 13 '20 at 06:47
  • This solution seems to work for me: https://stackoverflow.com/a/56719318/1548275 – Jari Turkia May 13 '20 at 07:37
0

os.environ['AWS_DEFAULT_REGION'] works for Glue versions 2.0 and 3.0, but does not exist in previous versions. It gives the region code, for example us-east-1.

This was confirmed through running a small PySpark script to print out the environment variables, on all the Glue versions, as suggested by the other answer.

David Liao
  • 653
  • 8
  • 18