-1

I have a single container with around 200k images on my blob storage. I want to write a script in Python that copies out batches of 20k of these images to new containers called something like imageset1, imageset2, ..., imageset20 (the last container will have less than 20k images in it which is fine).

I have the following so far:

from azure.storage.blob import BlockBlobService 
from io import BytesIO from shutil
import copyfileobj 
with BytesIO() as input_blob: 
   with BytesIO() as output_blob:
block_blob_service = BlockBlobService(account_name='my_account_name', account_key='my_account_key')

# Download as a stream 
block_blob_service.get_blob_to_stream('mycontainer', 'myinputfilename', input_blob) 


# Here is where I want to chunk up the container contents into batches of 20k


# Then I want to write the above to a set of new containers using, I think, something like this... 
block_blob_service.create_blob_from_stream('mycontainer', 'myoutputfilename', output_blob)

It's the chunking up the contents of a container and writing the results out to new containers which I don't know how to do. Can anyone help?

JassiL
  • 432
  • 1
  • 7
  • 24
  • [how-do-you-split-a-list-into-evenly-sized-chunks](https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks) – Patrick Artner Mar 17 '19 at 19:34
  • All that has been posted is a program description. Please see the [ask] help page and [The perfect question](http://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/) blog post by Jon Skeet. We can't be sure what you want from us. Please [edit] your post to include a valid question that we can answer. Reminder: make sure you know what is on-topic here by visiting the [help/on-topic]; asking us to write the program for you, suggestions, and external links are off-topic. – Patrick Artner Mar 17 '19 at 19:35
  • Is there any pattern to classify these images? By name or by timestrap, etc? – Peter Pan Mar 18 '19 at 06:18
  • Peter, no there isn't. The images are in the following format: RBG4906_1.jpg, RBG4906_2.jpg (so there are two slightly different images of the same thing with the suffix 1 or 2). The numbers in the image names aren't consecutive, so there's no pattern as far as I can tell. – JassiL Mar 18 '19 at 08:16
  • @JassiL So you just want to move them into different containers with average number size. Right? – Peter Pan Mar 18 '19 at 08:59
  • Well, copy them into containers - with each container having 20k images in it. – JassiL Mar 18 '19 at 10:29

1 Answers1

1

here is my sample code to realize your needs, and it works on my container.

from azure.storage.blob.baseblobservice import BaseBlobService

account_name = '<your account name>'
account_key = '<your account key>'
container_name = '<the source container name>'

blob_service = BaseBlobService(
    account_name=account_name,
    account_key=account_key
)

blobs = blob_service.list_blobs(container_name)

# The target container index starts with 1
container_index = 1
# The blob number in new container, such as 3 in my testing 
num_per_container = 3
count = 0
# The prefix of new container name
prefix_of_new_container = 'imageset'
flag_of_new_container = False

for blob in blobs:
    if flag_of_new_container == False:
        flag_of_new_container = blob_service.create_container("%s%d" % (prefix_of_new_container, container_index))
    print(blob.name, "%s%d" % (prefix_of_new_container,container_index))
    blob_service.copy_blob("%s%d" % (prefix_of_new_container, container_index), blob.name, "https://%s.blob.core.windows.net/%s/%s" % (account_name, container_name, blob.name))
    count += 1
    if count == num_per_container:
        container_index += 1
        count = 0
        flag_of_new_container = False

Note: I only use BaseBlobService because it's enough for your needs, even for AppendBlob or PageBlob. Also, you can use BlockBlobService instead of it.

Peter Pan
  • 23,476
  • 4
  • 25
  • 43