1

I have mounted a s3 bucket in my databricks and I can see the list of files and i can read the files as well using python

ACCESS_KEY = "XXXXXXXXXX"
SECRET_KEY = "XXXXXXXXXXXXXX"
ENCODED_SECRET_KEY = SECRET_KEY.replace("/", "%2F")
AWS_BUCKET_NAME = "testbucket"
MOUNT_NAME = "awsmount1"

dbutils.fs.mount("s3a://%s:%s@%s" % (ACCESS_KEY, ENCODED_SECRET_KEY, AWS_BUCKET_NAME), "/mnt/%s" % MOUNT_NAME)
display(dbutils.fs.ls("/mnt/%s/data" % MOUNT_NAME))

I want to find out the last modified date of the file i am reading, I couldn't find much but the java option Databricks read Azure blob last modified date for azure blob, is there a python native option in databricks to read the file metadata.

Brij Raj Singh - MSFT
  • 4,903
  • 7
  • 36
  • 55
  • 1
    I think the AWS SDK may be the best bet with the last modified date. I couldn't find anything on the CLI or REST API that can help, either. – Jon Jul 01 '19 at 09:37

1 Answers1

2

If i understand correctly, you need the last modified date for mounted file in Azure data bricks using python native sdk.

Here is the sample code to get the metadata information from Azure blob:

from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='accoutName', account_key='accountKey')
container_name ='containerName'
block_blob_service.create_container(container_name)
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
    lastModified= BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.last_modified
    print("\t Blob name: " + blob.name)
    print(lastModified)

you can get more details on this here.

If you are looking fro S3 then i would suggest you to use Boto.oto3 returns a datetime object for LastModified when you use the the (S3) Object python object:

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.last_modified

To compare LastModified to today's date (Python3):

import boto3
from datetime import datetime, timezone

today = datetime.now(timezone.utc)

s3 = boto3.client('s3', region_name='eu-west-1')

objects = s3.list_objects(Bucket='my_bucket')

for o in objects["Contents"]:
    if o["LastModified"] == today:
        print(o["Key"])

Reference

Hope it helps.

Mohit Verma
  • 5,140
  • 2
  • 12
  • 27