I am using Azure DataBricks notebook with Azure library to get list of files in Blob Storage. This task is scheduled and cluster is terminated after finishing the job and started again with new run.
I am using Azure 4.0.0 library (https://pypi.org/project/azure/)
Sometimes I am getting error message:
- AttributeError: module 'lib' has no attribute 'SSL_ST_INIT'
and very rarely also:
- AttributeError: cffi library '_openssl' has no function, constant or global variable named 'CRYPTOGRAPHY_PACKAGE_VERSION'
I have found a solution as uninstall openssl or azure library, restart cluster and install it again, but restarting cluster may not be possible because it may need to handle longer tasks, etc.
I also tried to install/upgrade openSSL 16.2.0 in initialization script, but it does not help and start conflicting with some another openSSL library which is in Databricks cluster by default
Is there any option what I can do with it?
There is the code for getting list of files from Blob Storage:
import pandas as pd
import re
import os
from pyspark.sql.types import *
import azure
from azure.storage.blob import BlockBlobService
import datetime
import time
r = []
marker = None
blobService = BlockBlobService(accountName,accountKey)
while True:
result = blobService.list_blobs(sourceStorageContainer, prefix = inputFolder, marker=marker)
for b in result.items:
r.append(b.name)
if result.next_marker:
marker = result.next_marker
else:
break
print(r)
Thank you