40

I'm using Azure Storage to serve up static file blobs but I'd like to add a Cache-Control and Expires header to the files/blobs when served up to reduce bandwidth costs.

Application like CloudXplorer and Cerebrata's Cloud Storage Studio give options to set metadata properties on containers and blobs but get upset when trying to add Cache-Control.

Anyone know if it's possible to set these headers for files?

abatishchev
  • 98,240
  • 88
  • 296
  • 433
Gavin
  • 5,629
  • 7
  • 44
  • 86
  • I've since discovered that Cache-Control can be set on individual blobs but I have over 500,000 files / blobs spread over 1000's of containers that I'd like to set caching headers for. Anyone know of an efficient method to set this header on all blobs? – Gavin Dec 22 '10 at 09:24
  • I thought I might have found a solution with CloudBerry Explorer for Windows Azure but though it looks like it can bulk update headers it doesn't actually work. Seems it's a known bug, but it still exists since March 2009 so I won't hold my breath waiting for a fix! http://cloudberrylab.com/forum/default.aspx?g=posts&t=3047 – Gavin Dec 23 '10 at 08:39
  • I tried it with CloudBerry, too. I could set the cache control header. But after saving, it drops the setting. Maybe it's because it's from the type "user definded" and not "system"??? – ownking Mar 11 '11 at 13:29
  • Gavin, how did you use Cloudberry Explorer to set cache-control headers for individual files? I've tried it and it seems to fail to work. – TMC May 10 '11 at 03:41

9 Answers9

26

I had to run a batch job on about 600k blobs and found 2 things that really helped:

  1. Running the operation from a worker role in the same data center. The speed between Azure services is great as long as they are in the same affinity group. Plus there are no data transfer costs.
  2. Running the operation in parallel. The Task Parallel Library (TPL) in .net v4 makes this really easy. Here is the code to set the cache-control header for every blob in a container in parallel:

    // get the info for every blob in the container
    var blobInfos = cloudBlobContainer.ListBlobs(
        new BlobRequestOptions() { UseFlatBlobListing = true });
    Parallel.ForEach(blobInfos, (blobInfo) =>
    {
        // get the blob properties
        CloudBlob blob = container.GetBlobReference(blobInfo.Uri.ToString());
        blob.FetchAttributes();
    
        // set cache-control header if necessary
        if (blob.Properties.CacheControl != YOUR_CACHE_CONTROL_HEADER)
        {
            blob.Properties.CacheControl = YOUR_CACHE_CONTROL_HEADER;
            blob.SetProperties();
        }
    });
    
Joel Fillmore
  • 5,978
  • 2
  • 27
  • 18
  • Apparently you can't pass the complete URI into GetBlobReference anymore. I've edited the answer with some of the latest code I wrote to do this for one of my containers. – Shiroy Jul 26 '16 at 22:18
  • It would also be cool if someone wrote a utility for this, considering how common it is that people want to do this! – Shiroy Jul 26 '16 at 22:19
  • I run this script against some 200,000 blobs in the same container, and get 409 Server Response. Does any one knows what it means ? – Guy Assaf Feb 12 '17 at 12:07
  • yay it's very similar for Azure SDK for Java :smile: – Ruslan López Jun 30 '21 at 21:57
12

Here's an updated version of Joel Fillmore's answer using Net 5 and V12 of Azure.Storage.Blobs. (Aside: wouldn't it be nice if default header properties could be set on the parent container?)

Instead of creating a website and using a WorkerRole, Azure has the ability to run "WebJobs". You can run any executable on demand on a website at the same datacenter where your storage account is located to set cache headers or any other header field.

  1. Create a throw-away, temporary website in the same datacenter as your storage account. Don't worry about affinity groups; create an empty ASP.NET site or any other simple site. The content is unimportant. I needed to use at least a B1 service plan, otherwise the WebJob aborted after 5 minutes.
  2. Create a console program using the code below which works with the updated Azure Storage APIs. Compile it for release, and then zip the executable and all required DLLs into a .zip file, or just publish it from VisualStudio and skip #3 below.
  3. Create a WebJob and upload the .zip file from step #2.
  4. Run the WebJob. Everything written to the console is available to view in the log file created and accessible from the WebJob control page. enter image description here
  5. Delete the temporary website, or change it to a Free tier (under "Scale Up").

The code below runs a separate task for each container, and I'm getting up to 100K headers updated per minute (depending on time of day?). No egress charges.

using Azure;
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Models;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace AzureHeaders
{
    class Program
    {
        private static string connectionString = "DefaultEndpointsProtocol=https;AccountName=REPLACE_WITH_YOUR_CONNECTION_STRING";
        private static string newCacheControl = "public, max-age=7776001"; // 3 months
        private static string[] containersToProcess = { "container1", "container2" };

        static async Task Main(string[] args)
        {
            BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);

            var tasks = new List<Task>();
            foreach (var container in containersToProcess)
            {
                BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(container);
                tasks.Add(Task.Run(() => UpdateHeaders(containerClient, 1000)));  // I have no idea what segmentSize should be!
            }
            Task.WaitAll(tasks.ToArray());
        }

        private static async Task UpdateHeaders(BlobContainerClient blobContainerClient, int? segmentSize)
        {
            int processed = 0;
            int failed = 0;
            try
            {
                // Call the listing operation and return pages of the specified size.
                var resultSegment = blobContainerClient.GetBlobsAsync()
                    .AsPages(default, segmentSize);

                // Enumerate the blobs returned for each page.
                await foreach (Azure.Page<BlobItem> blobPage in resultSegment)
                {
                    var tasks = new List<Task>();

                    foreach (BlobItem blobItem in blobPage.Values)
                    {
                        BlobClient blobClient = blobContainerClient.GetBlobClient(blobItem.Name);
                        tasks.Add(UpdateOneBlob(blobClient));
                        processed++;
                    }
                    Task.WaitAll(tasks.ToArray());
                    Console.WriteLine($"Container {blobContainerClient.Name} processed: {processed}");
                }
            }
            catch (RequestFailedException e)
            {
                Console.WriteLine(e.Message);
                failed++;
            }
            Console.WriteLine($"Container {blobContainerClient.Name} processed: {processed}, failed: {failed}");
        }

        private static async Task UpdateOneBlob(BlobClient blobClient) {
            Response<BlobProperties> propertiesResponse = await blobClient.GetPropertiesAsync();
            BlobHttpHeaders httpHeaders = new BlobHttpHeaders
            {
                // copy any existing headers you wish to preserve
                ContentType = propertiesResponse.Value.ContentType,
                ContentHash = propertiesResponse.Value.ContentHash,
                ContentEncoding = propertiesResponse.Value.ContentEncoding,
                ContentDisposition = propertiesResponse.Value.ContentDisposition,
                // update CacheControl
                CacheControl = newCacheControl  
            };
            await blobClient.SetHttpHeadersAsync(httpHeaders);
        }
    }
}
Jay Borseth
  • 1,894
  • 20
  • 29
  • Thanks, this code saved me some time. They have changed WebJobs. You can no longer specify the Run on Demand, apparently. I just created it as a continuous job, then watched the log to make sure it completed and stopped the job manually. – jrichview Apr 26 '17 at 14:50
5

The latest version of Cerebrata Cloud Storage Studio, v2011.04.23.00, supports setting cache-control on individual blob objects. Right click on the blob object, choose "View/Edit Blob Properties" then set the value for the Cache-Control attribute. (e.g. public, max-age=2592000).

If you check the HTTP headers of the blob object using curl, you'll see the cache-control header returned with the value you set.

TMC
  • 8,088
  • 12
  • 53
  • 72
  • 1
    Azure explorer also has this functionality and is free without a trial. http://www.cerebrata.com/products/azure-explorer/introduction – TWilly Oct 02 '14 at 19:11
3

Sometimes, the simplest answer is the best one. If you just want to manage a small amount of blobs, you can use Azure Management to change the headers/metadata for your blobs.

  1. Click on Storage, then click on the storage account name.
  2. Click the Containers tab, then click on a container.
  3. Click on a blob, then click on Edit at the bottom of the screen.

In that edit window, you can customize the Cache Control, Content Encoding, Content Language, and more.

Note: you cannot currently edit this data from the Azure Portal

John Washam
  • 4,073
  • 4
  • 32
  • 43
  • 1
    Unfortunatly, they have taken that feature away from us. When using the old Azure Management, you will now be greeted with "A new home for storage. Visit our new portal". But the functionality still is not in the portal. – Malyngo Aug 21 '17 at 15:54
  • The ability is now back in the porta and can be done with "Storage Explorer" which is currently in preview. – Stephen McDowell Sep 10 '18 at 20:26
  • @StephenMcDowell, I just checked a blob in our account, but I don't see anywhere to edit the Cache Control property or those other similar properties. Maybe I'm looking in the wrong spot. Where did you find this in the [Portal](https://portal.azure.com)? – John Washam Sep 11 '18 at 02:10
  • 2
    @JohnWasham Via the "Storage Explorer (preview)" you have access to the blob containers. Within a blob container if you select an item and right click I see "Properties.." as the last menu item. That opens another blade that if you scroll thru the top section I can see the text box where you can enter the cache control – Stephen McDowell Sep 11 '18 at 17:39
  • here is official docs for this https://learn.microsoft.com/en-us/azure/cdn/cdn-manage-expiration-of-blob-content#azure-storage-explorer – equivalent8 May 14 '20 at 09:54
2

Here's an updated version of Joel Fillmore's answer consuming WindowsAzure.Storage v9.3.3. Note that ListBlobsSegmentedAsync returns a page size of 5,000 which is why the BlobContinuationToken is used.

    public async Task BackfillCacheControlAsync()
    {
        var container = await GetCloudBlobContainerAsync();
        BlobContinuationToken continuationToken = null;

        do
        {
            var blobInfos = await container.ListBlobsSegmentedAsync(string.Empty, true, BlobListingDetails.None, null, continuationToken, null, null);
            continuationToken = blobInfos.ContinuationToken;
            foreach (var blobInfo in blobInfos.Results)
            {
                var blockBlob = (CloudBlockBlob)blobInfo;
                var blob = await container.GetBlobReferenceFromServerAsync(blockBlob.Name);
                if (blob.Properties.CacheControl != "public, max-age=31536000")
                {
                    blob.Properties.CacheControl = "public, max-age=31536000";
                    await blob.SetPropertiesAsync();
                }
            }               
        }
        while (continuationToken != null);
    }

    private async Task<CloudBlobContainer> GetCloudBlobContainerAsync()
    {
        var storageAccount = CloudStorageAccount.Parse(_appSettings.AzureStorageConnectionString);
        var blobClient = storageAccount.CreateCloudBlobClient();
        var container = blobClient.GetContainerReference("uploads");
        return container;
    }
Stephen McDowell
  • 839
  • 9
  • 21
2

Latest CloudBerry Explorer now supports Cache-Control: http://www.cloudberrylab.com/forum/default.aspx?g=posts&t=3047

mistika
  • 2,363
  • 2
  • 21
  • 25
1

This might be too late to answer, but recently I wanted to do the same in different manner, where I have list of images and needed to apply using powershell script (of course with the help of Azure storage assembly) Hope someone will find this useful in future.

Complete explanation given in Set Azure blob cache-control using powershell script

Add-Type -Path "C:\Program Files\Microsoft SDKs\Windows Azure\.NET SDK\v2.3\ref\Microsoft.WindowsAzure.StorageClient.dll"

$accountName = "[azureaccountname]"
$accountKey = "[azureaccountkey]"
$blobContainerName = "images"

$storageCredentials = New-Object Microsoft.WindowsAzure.StorageCredentialsAccountAndKey -ArgumentList $accountName,$accountKey
$storageAccount = New-Object Microsoft.WindowsAzure.CloudStorageAccount -ArgumentList $storageCredentials,$true
#$blobClient = $storageAccount.CreateCloudBlobClient()
$blobClient =  [Microsoft.WindowsAzure.StorageClient.CloudStorageAccountStorageClientExtensions]::CreateCloudBlobClient($storageAccount)

$cacheControlValue = "public, max-age=604800"

echo "Setting cache control: $cacheControlValue"

Get-Content "imagelist.txt" | foreach {     
    $blobName = "$blobContainerName/$_".Trim()
    echo $blobName
    $blob = $blobClient.GetBlobReference($blobName)
    $blob.Properties.CacheControl = $cacheControlValue
    $blob.SetProperties()
}
Tekz
  • 1,279
  • 14
  • 20
1

Here is a batch/unix script for everyone that does not sit on a Windows machine with PowerShell. The following script loops through all blobs and sets the Content-Cache property (Cache-Control http header) on the blobs individually.

Unfortunately, there is no good is no way to set properties on several blobs simultaneously, so this is a time consuming task. It usually takes around 1–2 seconds per blob. However, as Jay Borseth points out, the process is significantly accelerated if run it from a server in the same data center as your storage account.

# Update Azure Blob Storage blob's cache-control headers
# /content-cache properties
# 
# Quite slow, since there is no `az storage blob update-batch`
#
# Created by Jon Tingvold, March 2021
#
#
# If you want progress, you need to install pv:
# >>> brew install pv  # Mac
# >>> sudo apt install pv  # Ubuntu
#

set -e  # exit when any command fails

AZURE_BLOB_CONNECTION_STRING='DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=XXXXXXXXXXXX;AccountKey=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=='
CONTAINER_NAME=main

BLOB_PREFIX='admin/'
CONTENT_CACHE='max-age=3600'
NUM_RESULTS=10000000  # Defaults to 5000

BLOB_NAMES=$(az storage blob list --connection-string $AZURE_BLOB_CONNECTION_STRING --container-name $CONTAINER_NAME --query '[].name' --output tsv --num-results $NUM_RESULTS --prefix $BLOB_PREFIX)
NUMBER_OF_BLOBS=$(echo $BLOB_NAMES | wc -w)

echo "Ask Azure for files in Blob Storage ..."
echo "Set content-cache on $NUMBER_OF_BLOBS blobs ..."

for BLOB_NAME in $BLOB_NAMES
do
  az storage blob update --connection-string $AZURE_BLOB_CONNECTION_STRING --container-name $CONTAINER_NAME --name $BLOB_NAME --content-cache $CONTENT_CACHE > /dev/null;
  echo "$BLOB_NAME"

# If you don't have pv install, uncomment  everything after done
done | cat | pv -pte --line-mode --size $NUMBER_OF_BLOBS > /dev/null
JonT
  • 55
  • 9
0

Set storage blob cache-control Properties by PowerShell script

https://gallery.technet.microsoft.com/How-to-set-storage-blob-4774aca5

#creat CloudBlobClient 
Add-Type -Path "C:\Program Files\Microsoft SDKs\Windows Azure\.NET SDK\v2.3\ref\Microsoft.WindowsAzure.StorageClient.dll" 
$storageCredentials = New-Object Microsoft.WindowsAzure.StorageCredentialsAccountAndKey -ArgumentList $StorageName,$StorageKey 
$blobClient =   New-Object Microsoft.WindowsAzure.StorageClient.CloudBlobClient($BlobUri,$storageCredentials) 
#set Properties and Metadata 
$cacheControlValue = "public, max-age=60480" 
foreach ($blob in $blobs) 
{ 
  #set Metadata 
  $blobRef = $blobClient.GetBlobReference($blob.Name) 
  $blobRef.Metadata.Add("abcd","abcd") 
  $blobRef.SetMetadata() 

  #set Properties 
  $blobRef.Properties.CacheControl = $cacheControlValue 
  $blobRef.SetProperties() 
}
frank tan
  • 131
  • 1
  • 4