14

I'm trying to write my own log files to Azure Datalake Gen 2 in a Python-Notebook within Databricks. I'm trying to achieve that by using the Python logging module.

Unfortunately I can't get it working. No errors are raised, the folders are created but no file with logging content is created. Even if the files exists, nothing is written to it.

A local python script works just fine, but I can't get it working in Databricks.

Here is my code:

# mount
if not any(mount.mountPoint == '/mnt/log' for mount in dbutils.fs.mounts()):
  dbutils.fs.mount(
    source = "abfss://log@datalake.dfs.core.windows.net/",
    mount_point = "/mnt/log",
    extra_configs = configs)

# vars
folder_log = '/mnt/log/test/2019'
file_log = '201904.log'

# add folder if not existent
dbutils.fs.mkdirs(folder_log)

# setup logging
import logging
logging.basicConfig(
  filename=folder_log+'/'+file_log,
  format='%(asctime)s | %(name)s | %(levelname)s | %(message)s',
  datefmt='%Y-%m-%d %H:%M:%S UTC (%z)',
  level=logging.NOTSET
)

# test
logging.info('Hello World.')

Mounting seems to be ok.

Adding and writing files with dbutils works fine:

dbutils.fs.put(folder_log+'/'+file_log, 'Hello World.')

Writing to file like that works fine too:

f = open('/dbfs/mnt/log/test/2019/201904.log', 'w+')
f.write("This is line %d\r\n")
f.close()

Also tried adding "dbfs" to path

filename='/dbfs'+folder_log+'/'+file_log,

Any ideas?

Dominik Braun
  • 191
  • 1
  • 1
  • 5
  • Any updates to this problem? – benjamin May 07 '20 at 07:25
  • 2
    Databricks appears to do something strange to the builtin python logging module and getLogger. I've had to hook into log4j via py4j based on the solution in https://stackoverflow.com/a/34683626/1789708 – Scott H Jul 08 '20 at 19:36
  • I would recommend the link posted by @ScottH for anyone not on Azure. – k88 Feb 09 '22 at 13:07

2 Answers2

6

You can use azure_storage_logging handler:

import logging
from azure_storage_logging.handlers import BlobStorageRotatingFileHandler
log = logging.getLogger('service_logger')
azure_blob_handler = BlobStorageRotatingFileHandler(filename, 
                                                    account_name,
                                                    account_key,
                                                    maxBytes,
                                                    container)
log.addHandler(azure_blob_handler)
gilgorio
  • 528
  • 6
  • 10
  • It seems like this azure_storage_logging module is no longer being maintained has some outstanding issues. – brokkoo Dec 20 '22 at 10:21
  • this gives me an error, Operation not supported when running it for the second time and after – codebot May 16 '23 at 11:20
-2

Let me explain the steps for accessing or performing Write operations on Azure data lake storage using python

1) Register an application in Azure AD

enter image description here

enter image description here

2) Grant permission in data lake for the application you have registered

enter image description here

enter image description here

enter image description here

enter image description here

3) Please get the client secret from azure AD for the application you have registered.

4) You need to write a code to mount the directory in Azure data lake like below

dbutils.fs.mkdirs("/mnt/mountdatalake")

config = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
           "dfs.adls.oauth2.client.id": "Registered_Client_Id_From_Azure_Portal",
             "dfs.adls.oauth2.credential": "Cleint_Secret_Obtained_By_Azure_Portal",
               "dfs.adls.oauth2.refresh.url":"https://login.microsoftonline.com/Your_Directory_ID/oauth2/token"}

dbutils.fs.amount(
               source="adl://mydata.azuredatalakestore.net/mountdatabricks",
               mount_point ="/mnt/mountdatalake",
extra_configs=configs)

Once the configuration/mounting is done using application client credential, you are good to access the directory and log it.

for example , Below I have extracted couple of records from SQL server and stored it in azure data lake

enter image description here

Hope this helps.

Mohit Verma
  • 5,140
  • 2
  • 12
  • 27
  • Thanks for your reply. I am using Data Lake Storage Gen2 (which was merged with a storage account), so the connection and mounting looks a little bit different than your suggestion for Data Lake Storage Gen1. An app is registered and access is given by App ID / Object ID and Service Principal ID. The connection generally seems not to be the problem, because reading and writing with ```dbutils.fs.put``` or ```f.write()``` works fine. But it doesn't work with python logging module and I don't know why and how this is different from other methods of writing files. – Dominik Braun Apr 16 '19 at 10:59