2

How do I compare the output of Get-FileHash directly with the output of Properties.ContentMD5?


I'm putting together a PowerShell script that takes some local files from my system and copies them to an Azure Blob Storage Container.

The files change daily so I have added in a check to see if the file already exists in the container before uploading it.

I use Get-FileHash to read the local file:

$LocalFileHash = (Get-FileHash "D:\file.zip" -Algorithm MD5).Hash

Which results in $LocalFileHash holding this: 67BF2B6A3E6657054B4B86E137A12382

I use this code to get the checksum of the blob file already transferred to the container:

$BlobFile = "Path\To\file.zip"
$AZContext = New-AZStorageContext -StorageAccountName $StorageAccountName -SASToken "<token here>"

$RemoteBlobFile = Get-AzStorageBlob -Container $ContainerName -Context $AZContext -Blob $BlobFile -ErrorAction Ignore 
if ($ExistingBlobFile) { 
    $cloudblob = [Microsoft.Azure.Storage.Blob.CloudBlockBlob]$RemoteBlobFile.ICloudBlob
    $RemoteBlobHash = $cloudblob.Properties.ContentMD5
}

This value of $RemoteBlobHash is set to Z78raj5mVwVLS4bhN6Ejgg==

No problem, I thought, I'll just decrypt the Base64 string and compare:

$output = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($RemoteBlobHash))

Which gives me g�+j>fWKK��7�#� so not directly comparable ☹


This question shows someone in a similar pickle but I don't think they were using Get-FileHash given the format of their local MD5 result.

Other things I've tried:

  • changing the System.Text.Encoding line above UTF8 to UTF16 & ASCII which changes the output but not to anything recognisable.
  • dabbling with GetBytes to see if that helped:
$output = [System.Text.Encoding]::UTF8.GetBytes([System.Text.Encoding]::UTF16.GetString([System.Convert]::FromBase64String($RemoteBlobHash)))

Note: Using md5sum to compare the local file and a downloaded copy of file.zip results in the same MD5 string as Get-FileHash: 67BF2B6A3E6657054B4B86E137A12382

Thank you in advance!

KevH
  • 123
  • 4

1 Answers1

3

ContentMD5 is a base64 representation of the binary hash value, not the resulting hex string :)

$md5sum = [convert]::FromBase64String('Z78raj5mVwVLS4bhN6Ejgg==')
$hdhash = [BitConverter]::ToString($md5sum).Replace('-','')

Here we convert base64 -> binary -> hexadecimal


If you need to do it the other way around (ie. for obtaining a local file hash, then using that to search for blobs in Azure), you'll first need to split the hexadecimal string into byte-size chunks, then convert the resulting byte array to base64:

$hdhash = '67BF2B6A3E6657054B4B86E137A12382'
$bytes  = [byte[]]::new($hdhash.Length / 2)
for($i = 0; $i -lt $bytes.Length; $i++){
  $offset = $i * 2
  $bytes[$i] = [convert]::ToByte($hdhash.Substring($offset,2), 16)
}
$md5sum = [convert]::ToBase64String($bytes)

 

Koen
  • 475
  • 4
  • 17
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Thankyou! This works perfectly! The binary aspect was definitely something that was tripping me up. The Microsoft page for the [ContentMD5 property](https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.storage.blob.blobproperties.contentmd5?view=azure-dotnet-legacy) is rather sparse which didn't help – KevH May 20 '20 at 13:18
  • The length of the base64 string gave it away - not nearly long enough to encode the full hex string representation (32 chars), but long enough to encode the hash itself (16 bytes) – Mathias R. Jessen May 20 '20 at 13:36