0

We have application wherein many (~1000) consumers try to fetch files from blob storage. There is no concurrent access on blob files, but they share single storage account. I see files available on the blob storage, but we are constantly seeing below exception

Caused by: com.microsoft.azure.storage.StorageException: The specified blob does not exist.
at com.microsoft.azure.storage.StorageException.translateFromHttpStatus(StorageException.java:207)[3:org.ops4j.pax.logging.pax-logging-service:1.6.9]
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:172)[3:org.ops4j.pax.logging.pax-logging-service:1.6.9]
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:306)[3:org.ops4j.pax.logging.pax-logging-service:1.6.9]
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:177)[3:org.ops4j.pax.logging.pax-logging-service:1.6.9]
at com.microsoft.azure.storage.blob.CloudBlob.downloadAttributes(CloudBlob.java:1268)[3:org.ops4j.pax.logging.pax-logging-service:1.6.9]
at com.microsoft.azure.storage.blob.CloudBlob.downloadAttributes(CloudBlob.java:1235)[3:org.ops4j.pax.logging.pax-logging-service:1.6.9]

We are using

Azure-storage-api 1.1.0

Is this a known bug or limitation? What are the scenarios in which we will get this exception?

We download blobs using following code

String storageConnectionString = "DefaultEndpointsProtocol=http;AccountName="+ storageAccount + ";AccountKey=" + primaryAccessKey;
CloudStorageAccount account = CloudStorageAccount.parse(storageConnectionString);
CloudBlobClient blobClient = account.createCloudBlobClient();
CloudBlobContainer container = blobClient.getContainerReference(containerName.toLowerCase());
CloudBlockBlob blockBlob = container.getBlockBlobReference(fileName);
blockBlob.downloadAttributes();
//http://stackoverflow.com/questions/1071858/java-creating-byte-array-whose-size-is-represented-by-a-long
int size = (int)blockBlob.getProperties().getLength();
out = new byte[size];
blockBlob.downloadToByteArray(out, 0);
Peter Pan
  • 23,476
  • 4
  • 25
  • 43
Dhananjay
  • 3,903
  • 2
  • 29
  • 44
  • Can you share the code? How are your users accessing these blobs? Are they directly accessing the blob by its URL (`https://account.blob.core.windoes.net/container/blob.txt`? If that's the case, can you please check if the container's ACL? It should not be Private. – Gaurav Mantri Sep 12 '16 at 12:38
  • @GauravMantri added code – Dhananjay Sep 12 '16 at 12:59
  • Your code looks OK to me. Does this happen all the time or some time? – Gaurav Mantri Sep 12 '16 at 13:12
  • According to the exception information, compared with the list of [Blob Service Error Codes](https://msdn.microsoft.com/en-us/library/azure/dd179439.aspx), it seems to be caused by trying to access a nonexistent blob, that's a 404 not found error. Please check and make sure if this situation. – Peter Pan Sep 13 '16 at 02:33
  • Yes, @PeterPan-MSFT , after digging through the urls I received, there were non-existent blob requests causing this issue – Dhananjay Sep 20 '16 at 07:21

2 Answers2

1

What is constantly? Is it always, or is it when more than X consumers are trying to fetch the blob?

On the Scalability Targets for Azure Storage you can learn more about the targeted scalability parameters. One of which is target throughput for single blob:

Target throughput for single blob Up to 60 MB per second, or up to 500 requests per second

With your 1000 consumers, there is no doubt you hit that limit when querying the same blob. Question is - do you really need to get the from the blob so intense, can you cache somewhere (intermediate facede) or can you use CDN (it also works with SAS's )

If the 1000 consumers are hitting 1000 different blobs, there are are limitations, like:

Total Request Rate (assuming 1KB object size) per storage account Up to 20,000 IOPS, entities per second, or messages per second

Which, for the 1000 consumers makes 20 requests per second - based on the number of blocks in your files, it may well also be that limit.

In any way, you shall revise your application and discover which limit you hit.

astaykov
  • 30,768
  • 3
  • 70
  • 86
  • It turned out to be bad urls causing this issue and not the concurrency, I agree with you about concurrency limits of storage access and keep check on those – Dhananjay Sep 20 '16 at 07:25
0

This is just to make things clear for someone who reads this question in future, after scanning through all the request urls for download.

There were bunch of non-existent blob urls which were causing this exception.

Dhananjay
  • 3,903
  • 2
  • 29
  • 44