I'm currently working on something that requires me to pass a Base64 string to a PowerShell script. But while decoding the string back to the original I'm getting some unexpected results as I need to use UTF-7 during decoding and I don't understand why. Would someone know why?
The Mozilla documentation would suggest that it's insufficient to use Base64 if you have Unicode characters in your string. Thus you need to use a workaround that consists of using encodeURIComponent
and a replace. I don't really get why the replace is needed and shortened it to btoa(escape('✓ à la mode'))
to encode the string. The result of that operation would be JXUyNzEzJTIwJUUwJTIwbGElMjBtb2Rl
.
Using PowerShell to decode the string back to the original, I need to first undo the Base64 encoding. In order to do System.Convert
can be used (which results in a byte array) and its output can be converted to a UTF-8 string using System.Text.Encoding
. Together this would look like the following:
$bytes = [System.Convert]::FromBase64String($inputstring)
$utf8string = [System.Text.Encoding]::UTF8.GetString($bytes)
What's left to do is URL decode the whole thing. As it is a UTF-8 string I'd expect only to need to run the URL decode without any further parameters. But if you do that you end up with a accented a that looks like �
in a file or ?
on the console. To get the actual original string it's necessary to tell the URL decode to use UTF-7 as the character set. It's nice that this works but I don't really get why it's necessary since the string should be UTF-8 and UTF-8 certainly supports an accented a. See the last two lines of the entire script for what I mean. With those two lines you will end up with one line that has the garbled text and one which has the original text in the same file encoded as UTF-8
Entire PowerShell script:
Add-Type -AssemblyName System.Web
$inputstring = "JXUyNzEzJTIwJUUwJTIwbGElMjBtb2Rl"
$bytes = [System.Convert]::FromBase64String($inputstring)
$utf8string = [System.Text.Encoding]::UTF8.GetString($bytes)
[System.Web.HttpUtility]::UrlDecode($utf8string) | Out-File -Encoding utf8 C:\temp\output.txt
[System.Web.HttpUtility]::UrlDecode($utf8string, [System.Text.UnicodeEncoding]::UTF7) | Out-File -Append -Encoding utf8 C:\temp\output.txt
Clarification:
The problem isn't the conversion of the Base64 to UTF-8. The problem is some inconsistent behavior of the UrlDecode
of C#. If you run escape('✓ à la mode')
in your browser you will end up with the following string %u2713%20%E0%20la%20mode
. So we have a Unicode representation of the check mark and a HTML entity for the á
. If we use this directly in UrlDecode
we end up with the same error. My current assumption would be that it's an issue with the encoding of the PowerShell window and pasting characters into it.