0

I have a strange issue that I am finding when decoding base64 strings in Powershell.

$url = "https://*******.search.windows.net/indexes/azureblob-index/docs?api-version=2019-05-06&search=*"

$headers = @{
    "api-version" = "2019-05-06"
    "Content-Type" = "application/json"
    "api-key" = "**********"
}

$result = Invoke-webrequest -Uri $url -Headers $headers -Method Get | ConvertFrom-Json 

$values = $result.value

foreach ($value in $values)            
  {
  $path = $value.metadata_storage_path
  $bloburl = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($path))
  $bloburl
  }

The **** are hiding sensitive information, obviously.

So I am trying to return an Azure blob URL which is encoded. It managed to decode, however, it returns and error and seems to add a character to the end of the URL - making it out of sync with what the base64 decoding is expecting.

Result looks like this -

https://*******.blob.core.windows.net/files/REPORTS/*****/SEISMIC_ACQUISITION/ACQUISITION_REPORT_APPENDIX4_DAY_LOGS_JD_201.pdf5 

It is always a number 5 that is added to the end of the string.

Any ideas as to what is going on here?

This is the full code - there is nothing else going on.

All that is happening is sending a search query to Azure search and returning the urls of blobs/documents which match the search query. Azure returns a base64 string and I want to decode that to plain readable text.

The error is:

Exception calling "FromBase64String" with "1" argument(s): "Invalid length for a Base-64 char array or string."
At line:25 char:9
+         $bloburl = [System.Text.Encoding]::UTF8.GetString([System.Con ...
+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : FormatException

It occurs at the line where it converts from base64.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Mike
  • 241
  • 1
  • 13
  • So it sounds like the web-request response is returning a string that isn't a valid Bae64-encoded string, or, as the result suggests, perhaps it contains more than just such a string. What does the documentation say about the format of this data? Is the returned URL all that you expect, except for the extraneous `5`? – mklement0 Nov 22 '19 at 12:21
  • When I use an online base64 decoder, I get the expected URL so it seems to be malforming it somewhere when it is being converted. Interestingly enough, if I add a "=" to the end of the string prior to being converted, it corrects the output URL but then throws the same "invalid length" error for other strings I am converting. It is so random. A bug? – Mike Nov 22 '19 at 13:08
  • Does the result contain non-ASCII characters? Could there be a character-encoding problem (Invoke-WebRequest misinterpreting the source encoding)? – mklement0 Nov 22 '19 at 13:11
  • Can you create a non-sensitive instance that exhibits the same behaviour and post the raw result of ```Invoke-webrequest -Uri $url -Headers $headers -Method Get``` against that? It'd be easier to diagnose the problem with an example response... – mclayton Nov 22 '19 at 14:06
  • Also, you could try converting your *expected* value back into base64 and see if you get the same base64 string as returned by the api - that might give a clue as to the differences... – mclayton Nov 22 '19 at 14:11

1 Answers1

1

The metadata_storage_path "base64 encoded" value is apparently a mangled version of base64 where any trailing "=" are removed, and a digit placed there to indicate how many "=" were removed. This is designed to allow the base64 string to be used a bit easier in urls.

See this question for more details:

How to decode metadata_storage_path produced by Azure Search indexer in .NET Core

You'll need to compensate for this modification to get back to a valid base64 encoded string before you can decode it. The linked answer gives some options for how to do this.

mclayton
  • 8,025
  • 2
  • 21
  • 26