4

After the ETL Job is done, What is the best way to call stored procedure in AWS Glue script?

I am using PySpark to fetch the data from S3 and storing in staging table. After this process, need to call a stored procedure. This stored procedure loads data from the staging table into the appropriate MDS tables.

If I have to call a Stored Procedure after ETL Job is done, what is the best way? If I consider AWS lambda, is there any way that lambda can be notified after the ETL.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Santhosh Nagulanchi
  • 724
  • 4
  • 12
  • 23

3 Answers3

2

you can do it using py4j and all your code will be in one place, IMHO way better solution than others

How to run arbitrary / DDL SQL statements or stored procedures using AWS Glue

mishkin
  • 5,932
  • 8
  • 45
  • 64
1

You can use the boto sdk to fire off your lambda at the end of your glue ETL job. Since you are writing to a staging table, you will also need to have a NAT Gateway on the connection's subnet for the boto call to work. The example at this link uses boto to fire an ECS task, which would be a similar process: https://github.com/full360/glue-sneaql-demo/blob/master/Glue-Job-Script.

Dandalf
  • 2,379
  • 1
  • 19
  • 17
0

Use the SDK directly avoid set up additional infrastructure and coordinate it, but some times that is not possible (network, security, architecture, etc).

In that case, you can set up a Cloudwatch Rule to trigger a Lambda when the job's status changes to SUCCEEDED.

Event pattern:

{
  "detail-type": [
    "Glue Job State Change"
  ],
  "source": [
    "aws.glue"
  ],
  "detail": {
    "state": [
      "SUCCEEDED"
    ]
  }
}

and trigger a Lambda on this event.

RojoSam
  • 1,476
  • 12
  • 15