1

I have a list of file names that I need to search on Azure. Right now as a noob I am looping over each blob names and comparing strings but I think there has to be easiest and fast way to get this done. The current solution is making my HTTP response very slow.

def ifblob_exists(self, filename):
        try:
            container_name = 'xxx'
            AZURE_KEY = 'xxx'
            SAS_KEY = 'xxx'
            ACCOUNT_NAME = 'xxx'
            block_blob_service = BlockBlobService(account_name= ACCOUNT_NAME, account_key= None, sas_token = SAS_KEY, socket_timeout= 10000)

            generator = block_blob_service.list_blobs(container_name)
            for blob in generator:
                if filename == blob.name:
                    print("\t Blob exists :"+" "+blob.name)
                    return True
                else:
                    print('Blob does not exists '+filename)
                    return False
        except Exception as e:
            print(e)
Evandro de Paula
  • 2,532
  • 2
  • 18
  • 27
Pawankumar Dubey
  • 387
  • 1
  • 6
  • 21

2 Answers2

3

Please use exists method in azure storage python sdk.

def ifblob_exists(filename):
    try:
        container_name = '***'

        block_blob_service = BlockBlobService(account_name=accountName, account_key=accountKey,
                                              socket_timeout=10000)

        isExist = block_blob_service.exists(container_name, filename)
        if isExist:
            print("\t Blob exists :" + " " + filename)
        else:
            print("\t Blob exists :" + " " + filename)

Surely, if you have list of filename, you need to loop call above function at least.

Hope it helps you.

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • I have 19 files on Azure, Solution I have takes 24 seconds to loop over every file. The solution you proposed takes 19 seconds. The same amount of time is taken with the solution proposed by @Evandro Paula – Pawankumar Dubey Jul 04 '18 at 06:48
  • as i know,no such direct method to check existence of list file names for blob storage. – Jay Gong Jul 04 '18 at 06:53
  • Pawan...Think about the scenario when you have to search for these 19 blobs in 100000 blobs or even more. Solution proposed by Jay and Evandro will be more effective than your current solution then. – Gaurav Mantri Jul 04 '18 at 07:01
  • Same worries even if I use Jay solution. For thousand entries using search is a huge challenge. – Pawankumar Dubey Jul 04 '18 at 07:21
1

Listing all blobs is very costly operation inside the Azure Storage infrastructure because it translates into a full scan.

Find below an example to efficiently check if the blob (e.g. filename in your case) exists or not in a given container:

from azure.storage.blob import BlockBlobService
from datetime import datetime

def check_if_blob_exists(container_name: str, blob_names: []):
    start_time = datetime.now()

    if not container_name or container_name.isspace():
        raise ValueError("Container name cannot be none, empty or whitespace.")

    if not blob_names:
        raise ValueError("Block blob names cannot be none.")

        block_blob_service = BlockBlobService(account_name="{Storage Account Name}", account_key="{Storage Account Key}")

    for blob_name in blob_names:
        if block_blob_service.exists(container_name, blob_name):
            print("\nBlob '{0}' found!".format(blob_name));
        else:
            print("\nBlob '{0}' NOT found!".format(blob_name));

    end_time = datetime.now()

    print("\n***** Elapsed Time => {0} *****".format(end_time - start_time))

if __name__ == "__main__":
    blob_names = []

    # Exists
    blob_names.append("eula.1028.txt")
    blob_names.append("eula.1031.txt")
    blob_names.append("eula.1033.txt")
    blob_names.append("eula.1036.txt")
    blob_names.append("eula.1040.txt")

    # Don't exist
    blob_names.append("blob1")
    blob_names.append("blob2")
    blob_names.append("blob3")
    blob_names.append("blob4")

    check_if_blob_exists("containername", blob_names)

Find below a screenshot of a quick execution test from my laptop from West US (~150 Mbps of Download, ~3.22 Mbps of Upload, per Google Speed Test) checking if 9 blobs exists in a LRS Storage Account in West US as well.

enter image description here

Evandro de Paula
  • 2,532
  • 2
  • 18
  • 27
  • I have 19 files on Azure, Solution I have takes 24 seconds to loop over every file. The solution you proposed takes 19 seconds. The same amount of time is taken with the solution proposed by @Jay Gong – Pawankumar Dubey Jul 04 '18 at 06:48
  • The small difference in execution time is likely due to the small amount of files in Azure Storage (e.g. 19 files) compared to files being compared (e.g. perhaps 15). What if you havd 15 files to check out 10K files in the container? It won't scale (e.g. paging, continuation tokens, multiples HTTP requests, etc.). Morever, it may lead to timeouts, throttling and other undesired side effects depending on the amount of IOPS consumed. – Evandro de Paula Jul 04 '18 at 07:02
  • I am worried about the same scenario. I will accept your solution but still, the time is taken to search is a lot. – Pawankumar Dubey Jul 04 '18 at 07:19
  • I just added a function closer to your scenario, which receives a list of files to verify instead of a single one, and run a quick test to check if 9 blobs exist (< 1 second execution time). – Evandro de Paula Jul 04 '18 at 07:29
  • BTW, if you **really** need **search capabilities**, you should probably check this out https://learn.microsoft.com/en-us/azure/search/search-howto-indexing-azure-blob-storage. – Evandro de Paula Jul 04 '18 at 07:35
  • 1
    Awesome, str(datetime.now()) start --> 2018-07-04 13:10:11.039050 str(datetime.now()) end --> 2018-07-04 13:10:17.125398 It just took 6 seconds. Thats much better. :) – Pawankumar Dubey Jul 04 '18 at 07:41
  • This answer is depreciated because using BlockBlobService – RaphWork Sep 14 '20 at 16:09
  • This is a better method https://stackoverflow.com/questions/63888136/checking-if-a-blob-exist-in-python-azure – RaphWork Sep 15 '20 at 07:12