3

I am using Azure DataBricks notebook with Azure library to get list of files in Blob Storage. This task is scheduled and cluster is terminated after finishing the job and started again with new run.

I am using Azure 4.0.0 library (https://pypi.org/project/azure/)

Sometimes I am getting error message:

  • AttributeError: module 'lib' has no attribute 'SSL_ST_INIT'

and very rarely also:

  • AttributeError: cffi library '_openssl' has no function, constant or global variable named 'CRYPTOGRAPHY_PACKAGE_VERSION'

I have found a solution as uninstall openssl or azure library, restart cluster and install it again, but restarting cluster may not be possible because it may need to handle longer tasks, etc.

I also tried to install/upgrade openSSL 16.2.0 in initialization script, but it does not help and start conflicting with some another openSSL library which is in Databricks cluster by default

Is there any option what I can do with it?

There is the code for getting list of files from Blob Storage:

import pandas as pd
import re
import os
from pyspark.sql.types import *
import azure
from azure.storage.blob import BlockBlobService
import datetime
import time

r = []
marker = None
blobService = BlockBlobService(accountName,accountKey)
while True:
  result = blobService.list_blobs(sourceStorageContainer, prefix = inputFolder, marker=marker)
  for b in result.items:
    r.append(b.name)
  if result.next_marker:
    marker = result.next_marker
  else:
    break
print(r)

Thank you

smeidak
  • 285
  • 1
  • 3
  • 13
  • 2
    Please post an answer about your solution or steps for fixing your issue, it will help other people who got the similar issue. Thanks. – Peter Pan Mar 07 '19 at 10:07

3 Answers3

2

Solution for this issue is downgrade Azure library to 3.0.0.

It looks like Azure v4 has conflicts with some initial libraries in Databricks.

smeidak
  • 285
  • 1
  • 3
  • 13
1

This issue also has a link with the pyOpenSSL package too. Downgrading to version 18.0.0 did the trick for me. I used the below script as init script at cluster initilization

dbutils.fs.put("/databricks/script/pyOpenSSL-install.sh",""" 
#!/bin/bash 
/databricks/python/bin/pip uninstall pyOpenSSL -y 
/databricks/python/bin/pip install pyOpenSSL==18.0.0 
""", True)
Pawan
  • 43
  • 8
  • This solved my issue.. However, instead of downgrading, I upgrade it to version 19.0.0. And the package that has problem (`azure-storage-blob` was 12.2.0). Here's the details of how I did it: https://kb.databricks.com/python/python-exec-display-cancelled.html#problem-module-lib-has-no-attribute-ssl_st_init – stack247 Feb 22 '20 at 17:38
0

Running databricks runtime 6 or higher should now solve this.

simon_dmorias
  • 2,343
  • 3
  • 19
  • 33