1

I'm using Python 3.8 and Azure data lake Gen 2. I want to set an expiration time for a file I save on the data lake. Following this -- azure.datalake.store.core.AzureDLFileSystem class | Microsoft Docs, I tried the below

            file_client = directory_client.create_file(filename)
            file_client.upload_data(
                data,
                overwrite=True
            )
            ts = time.time() + 100
            file_client.set_expiry(path=path, expire_time=ts)

but am getting the error

AttributeError: 'DataLakeFileClient' object has no attribute 'set_expiry'

What's the proper way to set an expiration time when creating a file on the data lake?

Dave
  • 15,639
  • 133
  • 442
  • 830
  • Could you please tell me what you mean `expiration time`? Is that you want to delete file after some time? – Jim Xu Aug 28 '20 at 01:10
  • I'm referring to the sentence, "Set or remove the expiration time on the specified file. This operation can only be executed against files." on this page -- https://github.com/Azure/azure-data-lake-store-python – Dave Aug 28 '20 at 18:32
  • Azure data lake storage and Azure data lake Gen 2 are different services. they do not have the same feature. And Azure data lake Gen 2 is built on Azure blob storage. It has similar features with Azure blob storage. If you want to manage its lifetime, please refer to https://learn.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-portal – Jim Xu Aug 29 '20 at 13:39
  • I read the link you sent but I'm still not clear on whether or how to set an expiration on my file (blob?). – Dave Aug 30 '20 at 01:02
  • We have no way to do that. We just can use policy to manage its lifetime. – Jim Xu Aug 30 '20 at 02:13

1 Answers1

2

The reason for your error, is that you appear to be attempting to call a method belonging to azure.datalake.store.core.AzureDLFileSystem on an object of type DataLakeFileClient. This is why you get the error! The method does not exist for objects of type DataLakeFileClient.

If you wish to call the method for set_expiry, you must first create the correct kind of object.

For example in Gen1, create the object first as described here:

https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-data-operations-python

## Declare variables
subscriptionId = 'FILL-IN-HERE'
adlsAccountName = 'FILL-IN-HERE'

## Create a filesystem client object
adlsFileSystemClient = core.AzureDLFileSystem(adlCreds, store_name=adlsAccountName)

Using this object, you can call

adlsFileSystemClient exactly like how you have in your code example.

set_expiry(path, expiry_option, expire_time=None)

Just make sure you're trying to call methods on the correct type of object.

For Gen 2:

from azure.storage.filedatalake import DataLakeServiceClient
datalake_service_client = DataLakeServiceClient.from_connection_string(self.connection_string)

# Instantiate a FileSystemClient
file_system_client = datalake_service_client.get_file_system_client("mynewfilesystem")

For Gen2, you need to set a blob to expire as follows: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-portal#expire-data-based-on-age

Expire data based on age

Some data is expected to expire days or months after creation. You can configure a lifecycle management policy to expire data by deletion based on data age. The following example shows a policy that deletes all block blobs older than 365 days.

{
  "rules": [
    {
      "name": "expirationRule",
      "enabled": true,
      "type": "Lifecycle",
      "definition": {
        "filters": {
          "blobTypes": [ "blockBlob" ]
        },
        "actions": {
          "baseBlob": {
            "delete": { "daysAfterModificationGreaterThan": 365 }
          }
        }
      }
    }
  ]
}
Rahul Iyer
  • 19,924
  • 21
  • 96
  • 190