I have an initialization script which I am running while creating a Dataproc cluster. The script copies a python wheel package from GCS into the cluster and then installs the wheel on the cluster. This seemed to be working fine just few weeks ago but today when I am creating the cluster with a new version of the wheel it fails with below error in the dataproc logs. The changes to the wheel is only in the python packages(code) and its a pure python wheel. I am using dataproc image version 1.5.53-debian10
pip is already installed.
Traceback (most recent call last):
sys.exit(main())
File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_internal/main.py", line 45, in main
command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_internal/commands/__init__.py", line 96, in create_command
module = importlib.import_module(module_path)
File "/opt/conda/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 23, in <module>
from pip._internal.cli.req_command import RequirementCommand
File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 20, in <module>
from pip._internal.network.session import PipSession
File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_internal/network/session.py", line 17, in <module>
from pip._vendor import requests, six, urllib3
File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_vendor/requests/__init__.py", line 97, in <module>
from pip._vendor.urllib3.contrib import pyopenssl
File "/opt/conda/miniconda3/lib/python3.7/site-packages/pip/_vendor/urllib3/contrib/pyopenssl.py", line 46, in <module>
import OpenSSL.SSL
File "/opt/conda/miniconda3/lib/python3.7/site-packages/OpenSSL/__init__.py", line 8, in <module>
from OpenSSL import crypto, SSL
File "/opt/conda/miniconda3/lib/python3.7/site-packages/OpenSSL/crypto.py", line 1553, in <module>
class X509StoreFlags(object):
File "/opt/conda/miniconda3/lib/python3.7/site-packages/OpenSSL/crypto.py", line 1573, in X509StoreFlags
CB_ISSUER_CHECK = _lib.X509_V_FLAG_CB_ISSUER_CHECK
AttributeError: module 'lib' has no attribute 'X509_V_FLAG_CB_ISSUER_CHECK'
It seems to be some issue with the python and pip packages in dataproc. Can anyone suggest how to resolve this issue. The initialization script I am using is as follows.
#!/bin/bash
function install_pip() {
if command -v pip >/dev/null; then
echo "pip is already installed."
return 0
fi
if command -v easy_install >/dev/null; then
echo "Installing pip with easy_install..."
easy_install pip
return 0
fi
echo "Installing python-pip..."
apt update
apt install python-pip -y
}
# install pip in the cluster
install_pip
# Get the GCS location for SDK from cluster metadata attributes
SDK_GCS_LOCATION="$(/usr/share/google/get_metadata_value attributes/sdk-gcs-location)"
SDK_FILE_NAME="$(/usr/share/google/get_metadata_value attributes/sdk-file-name)"
readonly SDK_GCS_LOCATION
readonly SDK_FILE_NAME
SDK_WHEEL=$SDK_GCS_LOCATION/$SDK_FILE_NAME
# Copy wheel file from GCS to cluster
gsutil cp $SDK_WHEEL .
# Install wheel in cluster
pip install $SDK_FILE_NAME