2

I have a set of files (which are not locally saved) that needs to be uploaded onto azure blob storage and updated everyday.

(1) There are certain number of files with same name (with different contents) which should be saved as individual blobs.
(2) The updated set of files should overwrite the respective previous day blobs.

Is there a way to check if blob already exists and dynamically rename it by appending a number (can't append timestamp because of (2))?

I am using the below function to upload all my files:

def azure_upload_file(block_blob_service, container, local_file_path, local_file_name):
    logger = logging.getLogger('data')

    isExist = block_blob_service.exists(container, local_file_name)

    blobname = os.path.splitext(local_file_name)[0]
    blobext =  os.path.splitext(local_file_name)[1]


    if isExist is True:
        blob_file_name = '{}_{}{}'.format(blobname, '#', blobext)
    else:
        blob_file_name = local_file_name
    full_path_to_file =os.path.join(local_file_path, local_file_name)

    blob = block_blob_service.create_blob_from_path(container, blob_file_name, full_path_to_file)
    blob_url = block_blob_service.make_blob_url(container, blob_file_name)

    logger.info('Uploaded file {} to azure blob storage'.format(blob_file_name))
    os.unlink(full_path_to_file)

    return blob_url

Example:

Date: 19-11-2019 - Initial Upload

filename.ext -> blob
1. abcd.zip -> abcd.zip
2. abcd.zip -> abcd(1).zip
3. abcd.zip -> abcd(2).zip
4. defg.csv -> defg.csv

and so on..

All I want is to somehow fill the '#' in the code intelligently such that whenever I have the updated set of files, I would already know to which blob I should overwrite the file to.

i.e., if I have a new set of files on 20-11-2019

Example:

Date: 20-11-2019 - Second Upload

new filename.ext -> blob
1. abcd.zip -> abcd.zip
2. abcd.zip -> abcd(1).zip
3. abcd.zip -> abcd(2).zip
4. defg.csv -> defg.csv

and so on..

I have already gone through similar articles:
1. Azure blob upload rename if blob name exist
2. Faster Azure blob name search with python?

Both of them don't solve my problem. Wondering if there is an efficient and easy way this can be achieved?

supersaiyan
  • 79
  • 1
  • 2
  • 9
  • Here's what I understand: 1. First, you try to upload a blob, if it already exists, you then decide if you want to overwrite it or duplicate it based on when the bob was uploaded? 2. If the blob was uploaded before today, you overwrite it, else you add the (1) to the name and upload a new blob Let me know if this is not what you are trying to do – rakshith91 Nov 19 '19 at 03:45
  • @rakshith1124 Not quite. (1) I want to upload a set of files (some of which have exact same name) to azure blob storage. (2) These set of files has new versions everyday. So, once I got these new versions, I should overwrite the already existing blobs for each one of them. – supersaiyan Nov 19 '19 at 05:31
  • Do you have any code you have already tried? – rakshith91 Nov 19 '19 at 08:50

3 Answers3

1

If you're considering to have multiple versions of the same file, you just need to append a timestamp to the blob's name:

  • abcd20191118131800.zip
  • abcd20191118131900.zip

ordering by file's name (ascending / descending) will give you the latest / oldest file

Thiago Custodio
  • 17,332
  • 6
  • 45
  • 90
1

You could use exists method to check if blob already exist, then to check if the file name need to be changed.

The below is my test code, it could work for me.

    block_blob_service = BlockBlobService(account_name=accountName, account_key=accountKey,
                                              socket_timeout=10000)

    container_name ="test"
    local_path = "./data"
    local_file_name = "quickstart.txt"

    isExist = block_blob_service.exists(container_name, local_file_name)

if isExist:
    local_file_name = local_file_name.replace('.txt', '1.txt')
    upload_file_path = os.path.join(local_path, local_file_name)
    print("\nUploading to Azure Storage as blob:\n\t" + local_file_name)
    # Upload the created file, use local_file_name for the blob name.
    block_blob_service.create_blob_from_path(
    container_name, local_file_name, upload_file_path)
else:
    upload_file_path = os.path.join(local_path, local_file_name)
    print("\nUploading to Azure Storage as blob:\n\t" + local_file_name)
    block_blob_service.create_blob_from_path(
container_name, local_file_name, upload_file_path)

Update:

    container_name ="test"
    local_path = "./data"
    local_file_name="quickstart.txt"


    isExist = block_blob_service.exists(container_name, local_file_name)

    if not(isExist):
        upload_file_path = os.path.join(local_path, local_file_name)
        print("\nUploading to Azure Storage as blob:\n\t" + local_file_name)
        block_blob_service.create_blob_from_path(container_name, local_file_name, upload_file_path)
    else:
        i=1
        while(isExist):
            name = local_file_name.split('.')[0] + '(' + str(i) + ').' + local_file_name.split('.')[1]
            isExist = block_blob_service.exists(container_name, name)
            i=i+1
        upload_file_path = os.path.join(local_path, local_file_name)
        print("\nUploading to Azure Storage as blob:\n\t" + name)
        block_blob_service.create_blob_from_path(container_name, name, upload_file_path)
George Chen
  • 13,703
  • 2
  • 11
  • 26
  • I have more than two files with same name. I can't just append (1) always. – supersaiyan Nov 19 '19 at 05:44
  • @Saketh Gangam , I have update my code, while isExist is true judge the file name could be saved. – George Chen Nov 20 '19 at 06:40
  • @Saketh Gangam, any update on this issue? Could you implement it with my code? – George Chen Nov 22 '19 at 01:44
  • I have more than 7000 files and it is computationally expensive to check for the existence of the blobs with same name and then rename them. – supersaiyan Nov 22 '19 at 03:54
  • @Saketh Gangam, if you don't want to check the blob existence, which way you prefer to? Please do more description, cause Azure storage doesn't have a method to do the job automatically. Or you could add a timestamp to your file name, if will be very to implement and efficient. – George Chen Nov 22 '19 at 06:08
0

I am not exactly able to understand your question, but from what i understand, here's what you want to do using the latest version of azure-storage-blob (v12)

If you want to rename the blob if blob already exists:

from azure.storage.blob import ContainerClient
from azure.core.exceptions import ResourceExistsError

blob_name = "abcd.zip"
container_client = ContainerClient.from_connection_string(conn_str, "container_name")
try:
    blob_client = container_client .get_blob_client(blob_name)
    # upload the blob if it doesn't exist
    blob_client.upload_blob(data)
except ResourceExistsError:
    # check the number of blobs with the same prefix.
    # For example, This will return a generator of [abcd, abcd(1), abcd(2)]
    blobs = list(container_client.list_blobs(name_starts_with=blob_name))
    length = len(blobs)
    if length == 1:
        # it means there is only one blob - which is from the previous version
        blob_client.upload_blob(data, overwrite=True)
    else:
        # if there are 10 files with the name starting with abcd, it means your name for the 11th file will be abcd(10).
        name = blob_name.split('.')[0] + '(' + str(length) + ').' + a.split('.')[1]
        blob_client = container_client .get_blob_client(blob_name)
        blob_client.upload_blob(data)

Is this what you are trying to do? Let me know if this doesn't solve the problem.

rakshith91
  • 682
  • 5
  • 13