I am trying to have a setup similar to that of this article: https://aws.amazon.com/blogs/big-data/simplify-and-optimize-python-package-management-for-aws-glue-pyspark-jobs-with-aws-codeartifact/
I would like to install some packages using a custom --index-url <my-index-url>
.
To do this, I am following the Glue Job documentation here: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
According to the guide, I should add a parameter to the job like this:
--python-modules-installer-option
with value --index-url <my-index-url>
.
However, this argument does not get picked up at all. The logs do not show any sign that this argument is used.
When I try to install something from my custom index, it fails, as the parameter is not picked up.
Even trying with a simple value like --upgrade
does not work.
However, other options such as --additional-python-modules
do get picked up, and of course, the module installation goes through the default pip3
index set by the Python environment and not through the one I set, causing the job to fail if the package I specify is not in my index.
To reproduce this issue:
- go to AWS Glue Jobs
- create a new Python job with "Python Shell script editor" and selecting the boilerplate code option (it doesn't matter the code inside for reproducing this issue)
- create and select a proper AWS Glue IAM role to run the job with
- add any
pip3
valid option as a Job parameter like:Key: "--python-modules-installer-option"
,Value: "<valid-pip3-option>"
.
Thanks!