2

I have my code like:

$x = "This text needs to be encoded"
$z = [System.Text.Encoding]::Unicode.GetBytes($x)
$y = [System.Convert]::ToBase64String($z)
Write-Host("$y")

And the following gets printed to the console:

VABoAGkAcwAgAHQAZQB4AHQAIABuAGUAZQBkAHMAIAB0AG8AIABiAGUAIABlAG4AYwBvAGQAZQBkAA==

Now if I were to decode this b64 with powershell like:

$v = [System.Text.Encoding]::Unicode.GetString([System.Convert]::FromBase64String($y))
Write-Host("$v")

It would get decoded properly like:

This text needs to be encoded

However, if I was to put the aforementioned b64 encoded string to, say CyberChef and try to decode it with the "From base64" recipe, would the decoded string be filled in with dots like:

T.h.i.s. .t.e.x.t. .n.e.e.d.s. .t.o. .b.e. .e.n.c.o.d.e.d.

My question is, why does this happen?

Lauri
  • 61
  • 1
  • 8
  • 1
    So, you would need to understand what Encoding is CyberChef using... probably it's `[System.Text.Encoding]::UTF8` so you could give it a try instead of using `Unicode` – Santiago Squarzon Oct 19 '22 at 19:07
  • 1
    As aside, don't use brackets here `Write-Host("$y")`. Instead, use a space between the cmdlet and whatever you want written. Write-Host is a cmdlet, not an object method – Theo Oct 19 '22 at 19:45
  • Another aside: [`Write-Host` is typically the wrong tool to use](http://www.jsnover.com/blog/2013/12/07/write-host-considered-harmful/), unless the intent is to write _to the display only_, bypassing the success output stream and with it the ability to send output to other commands, capture it in a variable, or redirect it to a file. To output a value, use it _by itself_; e.g, `$value`, instead of `Write-Host $value` (or use `Write-Output $value`); see [this answer](https://stackoverflow.com/a/60534138/45375). To explicitly print only to the display _but with rich formatting_, use `Out-Host`. – mklement0 Oct 19 '22 at 22:07

1 Answers1

3

Santiago Squarzon has provided the crucial pointer:

  • What CyberChef's recipe most likely expects is for the bytes that the Base64 string encodes to be based on the UTF-8 encoding of the original string.

  • By contrast, the - poorly named - [System.Text.Encoding]::Unicode encoding is the UTF-16LE encoding, where characters are represented by (at least) two bytes (with the least significant byte coming first).

    • Characters whose Unicode code point is less than or equal to 0xFF (255), which includes the entire ASCII range that all characters in your input string fall into, therefore have a NUL byte (value 0x0) as the second byte of their two-byte representation; e.g., the letter T encoded as UTF-16LE is composed of the two-byte sequence 0x54 0x0, where 0x54 by itself represents the letter T in ASCII encoding - and therefore also in UTF-8, which is a superset of ASCII that represents (only) non-ASCII characters as multi-byte sequences.
    • Therefore, the two-byte sequence 0x54 0x0 is interpreted as two characters in the context of UTF-8: letter T (0x54) and NUL (0x0). NUL has no visual representation per se (it is a non-printable character), but a common convention is to visualize it as ., which is what you saw.

Therefore, create your Base64-encoded string as follows:

$orig = "This text needs to be encoded"
$base64 = 
  [System.Convert]::ToBase64String(
    [System.Text.Encoding]::UTF8.GetBytes($orig)
  )

Note: Even though [System.Text.Encoding]::UTF8 is - up to at least .NET 6 - a UTF-8 encoding with BOM, a BOM is (fortunately) not prepended to the input string by the .GetBytes() method. As an aside: Changing this encoding to be BOM-less altogether is being considered prior to .NET 7.

$base64 then contains: VGhpcyB0ZXh0IG5lZWRzIHRvIGJlIGVuY29kZWQ=

mklement0
  • 382,024
  • 64
  • 607
  • 775