1

I have a Spark job that runs on EMR and reads dataset from S3 (nested json file), join it with other dataset and overwrite few S3 files explicitly.

So, this is not a standard ETL use-case but can AWS Glue provide the same functionality? If yes, is Glue cheaper than EMR?

Abhay Dubey
  • 549
  • 2
  • 7
  • 18

1 Answers1

1

Yes, the above use-case should be possible with Glue as well, think you can flatten the nested JSON file, and further process to join with other datasets, write back to S3.

As for the cost comparison, please note that AWS Glue works out to be a little costlier than a regular EMR. This is due to the reason Glue is meant be servlesss and managed by AWS, besides its Data-catalog, Dev-endpoint, ETL code-generators, etc. features. Please refer here for a cost comparisons for Glue & EMR.

Yuva
  • 2,831
  • 7
  • 36
  • 60
  • Please note I have not tried overwriting existing S3 files. By Overwrite, do you mean update the existing S3 file or replace the existing file with new one? Think updating existing files may not be possible. – Yuva Apr 30 '18 at 03:41