0

I am trying to have a setup similar to that of this article: https://aws.amazon.com/blogs/big-data/simplify-and-optimize-python-package-management-for-aws-glue-pyspark-jobs-with-aws-codeartifact/

I would like to install some packages using a custom --index-url <my-index-url>. To do this, I am following the Glue Job documentation here: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html According to the guide, I should add a parameter to the job like this:

--python-modules-installer-option with value --index-url <my-index-url>.

However, this argument does not get picked up at all. The logs do not show any sign that this argument is used.

When I try to install something from my custom index, it fails, as the parameter is not picked up.

Even trying with a simple value like --upgrade does not work.

However, other options such as --additional-python-modules do get picked up, and of course, the module installation goes through the default pip3 index set by the Python environment and not through the one I set, causing the job to fail if the package I specify is not in my index.

To reproduce this issue:

  • go to AWS Glue Jobs
  • create a new Python job with "Python Shell script editor" and selecting the boilerplate code option (it doesn't matter the code inside for reproducing this issue)
  • create and select a proper AWS Glue IAM role to run the job with
  • add any pip3 valid option as a Job parameter like: Key: "--python-modules-installer-option", Value: "<valid-pip3-option>".

Thanks!

Adrian Castro
  • 140
  • 1
  • 7

1 Answers1

2

That flag and the blog is for Glue ETl
For Shell the value of --additional-python-modules is passed directly to pip you can specify your options directly inside that value (as if you were passing parameters to pip)

  • After reaching out to the AWS support, they mentioned that this is a bug. The `--additional-python-modules` flag does not work as expected either for the `pythonshell`. I still haven't tested this on Glue 4.0, but I doubt they have fixed this issue. – Adrian Castro Jan 25 '23 at 08:24