We are trying to evaluate AWS Gleu for ETL processing. In that line, there is a need to consume one of the external REST api from ETL job script. When job is run, Glue logs with Connection time out error and no documentation around consuming external services within Glue. As Glue itself is serverless, we dont have any control on the environment from where external call is being made. Has anybody aware of this issue or tried?
Asked
Active
Viewed 1,659 times
7
-
I'm in the same situation, did you ever solve this question? – Michael Black Feb 28 '19 at 17:36
-
did either of you ever figure this out? Just starting to investigate this as well... I'm guessing this might have to do with VPC configuration but still haven't been able to find a definitive answer on the topic. – aiguofer Mar 20 '20 at 17:39
-
Did either of you three figure this out? I saw this example. https://stackoverflow.com/questions/68551295/aws-glue-convert-the-json-response-from-getrest-api-request-to-dataframe-dya – Nirmal Mar 10 '23 at 10:31
-
Alternatively, easy way would be: How about I create a DataBricks job where I call the bunch of end points exposed via a REST API (by an external entity) - sink the results as is in rds/relational db (staging area if you will) tables (or s3), then run a glue job for transformation which will be stored in the production tables in rds/relational db. – Nirmal Mar 10 '23 at 10:37