I am trying to pip install package psycopg2
on the Dataproc cluster. I have tried the following but as my work computer has firewall restrictions so this isn't working.
REGION=<region>
gcloud dataproc clusters create my-cluster \
--image-version 1.4 \
--metadata 'CONDA_PACKAGES=psycopg2' \
--metadata 'PIP_PACKAGES=psycopg2' \
--initialization-actions \
gs://goog-dataproc-initialization-actions-${REGION}/python/conda-install.sh,gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh
So now i have placed the psycopg2.whl
and also psycopg2.tar.gz
files ins GSC. Now I need to install them somehow during Dataproc cluster creation and seems its possible after looking at this https://stackoverflow.com/a/50280108/13433956
Can anyone provide more details on how to get pip to install the whl
or tar.gz
file to install from GCS through Dataproc initialization-actions. Thanks!