1

I am storing data (JSON) into blobs in Azure, capturing it hourly to create relatively small JSON documents. Between backup times, I may have produce 10s or 100s (unlikely 1000s, but possible) of these documents that I then want to backup into blobs and organise by the year, month, day, and hour.

Two approaches I came up with are:

  1. Making the hour a folder and storing a separate blob for every backup within it
  2. Making each hour its own blob under the day's folder and appending all new documents to that blob so they are stored together

The access case will usually be they'll have somewhat frequent reads for awhile before being backed-off into cold/archive once they get old.

My question is: should I be favouring one method over the other for best practice, resource, or logical reasons, or is it basically personal preference with negligible performance hits? I'm especially interested in any resource differences in terms of reads and writes as I couldn't find or work out any useful information about that.

I'm also curious if there is any access benefits particularly for the append method (although the trade-off might be having to make sure you don't mess the blob up as you append to it) as you'll be storing the per-hour data always together in the same file, as well as how nicely one method or the other might fit with how the Python SDK is architected.

For this scenario I am using Python and making use of the Azure Python SDK packages.

Any other suggestions/methods also very welcome. Thanks.

Joe Moon
  • 104
  • 1
  • 10
  • The answer to this question is: if you need to restore information, do you want to query a single blob, or multiple small ones? How fast can you recover in each scenario? – Thiago Custodio Feb 03 '23 at 16:40

1 Answers1

1

If the read/write requirements are low, then it won’t matter, if you need high throughput then you might opt not to name your files this way.

Take a look at this, specifically the partitioning section. Performance and scalability checklist for Blob storage - Azure Storage | Microsoft Learn

Additional information: Note that “relatively small” and “somewhat frequent” mean different things to different people. some users might interpret that to mean < 1KB and several times an hour, while someone else might interpret it to mean < 1MB and several times a second (or even several times a ms). If the former, There is nothing to worry about.

If you still have any question on performance, I would recommended to contact support.