1

I'm trying to get a list of blob names in Azure and I'm looking for ways to make this operation significantly faster. Within a given sub-folder, the number of blobs can exceed 150,000 elements. The filenames of the blobs are an encoded ID which is what I really need to get at, but I could store that as some sort of metadata if there was a way to query just the metadata or a single field of the metadata.

I'm finding that something as simple as the following:

var blobList = container.ListBlobs(null, false);

can take upwards of 60 seconds to run from my desktop and typically around 15 seconds when running on a VM hosted in Azure. These times are based on a test of 125k blobs in an otherwise empty container and were several hours after they were uploaded, so they've definitely had time to "settle", so to speak.

I've attempted multiple variations and tried using ListBlobsSegmented but it doesn't really help because the function is returning a lot of extra information that I simply don't need. I just need the blob names so I can get at the encoded ID to see what's currently stored and what isn't.

The query for the blob names and extracting the encoded Id is somewhat time sensitive so if I could get it to under 1 second, I'd be happy with it. If I stored the files locally, I can get the entire list of files in a few ms, but I have to use Azure storage for this so that's not an option.

The only thing I can think of to be able to reduce the time it takes to identify the available blobs is to track the names of the blobs being added or removed from a given folder and store it in a separate blob. Then when I need to know the blob names in that folder, I would read the blob with the metadata rather than using ListBlobs. I suppose another would be to use Azure Table storage in a similar way, but it seems like I'm being forced into caching information about a given folder in the container.

Is there a better way of doing this or is this generally what people end up doing when you have hundreds of thousands of blobs in a single folder?

Mike Taber
  • 833
  • 6
  • 21
  • This type of question has been asked before, with similar variants (such as [this one](http://stackoverflow.com/questions/8158452/is-it-better-to-have-many-small-azure-storage-blob-containers-each-with-some-bl/8160317#8160317)). Blob storage isn't a database system, so there's no facilities for searching, other than looking at blob metadata or bulk listings. You'd need to use some type of database for storing blob metadata, to have searchable content, where you can then extract specific blob uri's and access blobs directly. – David Makogon Jun 29 '16 at 15:18
  • I looked at that question along with about a dozen other variants of it, which is why I didn't bother listing them individually. The issue I'm trying to resolve is getting the names of them, which one could argue is a form of a search, but it's really not. I just want a list of the names. However, with 125k items in the same folder, I have a mere 2.6MB of data stored but the function returns 10's of MB of data. All I need is the names of them, so it's a matter of: Is it possible to just get the names without all the other properties or no? – Mike Taber Jun 29 '16 at 15:33

1 Answers1

2

As mentioned, Azure Blob storage is a storage system and doesn't help you in indexing the content. We now have Azure Search Indexer which indexes the content uploaded to Azure Blob storage, refer https://azure.microsoft.com/en-us/documentation/articles/search-howto-indexing-azure-blob-storage/ with this you can perform all the features supported by Azure Search e.g. listing, searching, paging, sorting etc.. Hope this helps.

mannu2050
  • 403
  • 3
  • 11