2

I am Using PowerShell to gather some data from the Uninstall key of the registry and write to XML, and everything works right up until what needs to be written includes some simplified Chinese characters. When I look at the registry itself, the value of DisplayName is

Object Enabler for AutoCAD Plant 3D 2023 - 简体中文 (Simplified Chinese)

But when I use

Write-Host "$($uninstallKey.GetValue('DisplayName'))"

I get

Object Enabler for AutoCAD Plant 3D 2023 - 简体中文 (Simplified Chinese) EE- 88}

Not sure where that EE- 88} is coming from, and what else might be hiding there. At first, I thought my issue was with the encoding of the XML file at write. I had been using [System.Text.UTF8Encoding] which throws an error

Exception calling "Save" with "1" argument(s): "'.', hexadecimal value 0x00, is an invalid character."

But now I think the problem is elsewhere, since a Write-Host shows something different from what I see in the registry itself.

I am using

$localMachineHive = [Microsoft.Win32.RegistryKey]::OpenBaseKey([Microsoft.Win32.RegistryHive]::LocalMachine, 0)
$uninstallKey = $localMachineHive.OpenSubKey("$uninstallKeyPath\$uninstallKeyName")

to access the registry, where "$uninstallKeyPath\$uninstallKeyName" defines the key path (x64 or x32) to the individual key. I recently moved to this approach because it is much faster than PS native registry access. But perhaps there is an encoding nuance there that I am missing? Or is this a place where Write-Host is the problem?

EDIT: Verified the mechanism for accessing the registry isn't the issue. These both produce the same output, that doesn't match what I see in RegEdit.

$localMachineHive = [Microsoft.Win32.RegistryKey]::OpenBaseKey([Microsoft.Win32.RegistryHive]::LocalMachine, 0)
$uninstallKey = $localMachineHive.OpenSubKey("SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{BF3F377C-AF47-33EE-979F-67D4EFA9FAB0}")
Write-Host "$($uninstallKey.GetValue('DisplayName'))"

$displayName = Get-ItemPropertyValue -Path 'Registry::HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{BF3F377C-AF47-33EE-979F-67D4EFA9FAB0}' -Name DisplayName
Write-Host "$displayName"
Gordon
  • 6,257
  • 6
  • 36
  • 89
  • @mklement0 Yes, REG_SZ. And interestingly, I found another Autodesk key, with simplified Chinese, but no issues. Starting to think it's just weird Autodesk incompetence, which wouldn't;'t surprise me AT ALL. But how I get around it will be interesting. – Gordon Oct 02 '22 at 20:03
  • There shouldn't be any encoding issues, but Registry Editor doesn't always show the string as-is, such as quietly stripping newlines. However, this doesn't explain your case. You can examine the programmatically returned value for hidden control characters - see [this answer](https://stackoverflow.com/a/45356836/45375) for a helper function that does that (I recommend getting the code from the Gist). – mklement0 Oct 02 '22 at 20:16
  • That is pointing out some issues. The string shows as `Object Enabler for AutoCAD Plant 3D 2023 - 简体中文 (Simplified Chinese)`0EE-`088}` via DebugString. So now to see what `'`0` is – Gordon Oct 02 '22 at 20:20
  • `\`0` is a NUL character (code point `0x0`). You can create it in PowerShell with ``"`0"`` or `[char] 0`. As an aside: to use a single `\`` verbatim in a comment (inside `\`...\``), ``\``-escape it. – mklement0 Oct 02 '22 at 20:28
  • Yeah, it looks like what is there is actually NULEE-NUL88. Does that make any sense? I am tempted to just trim everything after the first NUL and hope that actually catches any other Autodesk screwups. Looks like `$split = $displayName -split ([char]0, 2)` may just work. – Gordon Oct 02 '22 at 20:33
  • That's a more elegant approach, to be sure. Will make a little function tomorrow, since I feel like I need to check every #%@$ value Autodesk writes to the registry now, in both my data gathering code and my "Get me the GUID based on DisplayName and other criteria" code, because the idiots change GUIDs with updates AND have multiple GUIDs in the registry where only one actually works. They really are horrible. – Gordon Oct 02 '22 at 20:41
  • 2
    I've linked a script in my answer that lists all registry values containing embedded _null_ character. Concluding from the many occurences on my machine it will be wise to proactively trim any `REG_SZ` and `REG_EXPAND_SZ` values at first _null_ character. /cc @mklement0 – zett42 Oct 03 '22 at 13:03
  • I suspect that code may find it's way into a little utility I will run after installing any Autodesk stuff to find any registry stuff they have messed up. I am hoping this was a one time shagging of the poodle, but it's Autodesk, and they suck. :( – Gordon Oct 03 '22 at 13:08

1 Answers1

2

This looks like an error of the data stored in the registry, probably a mismatch between the actual string length and the number of bytes passed per the cbData parameter of RegSetValueEx() (the native API for writing registry values).

If the program that wrote the registry value passed an argument for cbData that is too large, then it could actually store data beyond the actual string data in the registry (whatever happens to be in memory after the intended data, which could be just random "garbage" and worst case confidential data like passwords).

When PowerShell reads the registry value, it gets the null terminator character and any additional characters, which might appear as random characters. Note that RegEdit doesn't show these characters.

Workaround

Remove all characters starting from the null terminator character up to the end of the string:

# Using a RegEx to remove the first null character and any following characters
$displayName -replace '\0.*'

# alternatively:
($displayName -split ([char] 0), 2)[0]

Repro

Trying to actually reproduce the problem, I've created a bogus C++ console application:

#include <windows.h>

struct Test {
    wchar_t const user[7] = L"MyUser";
    wchar_t const password[6] = L"MyPwd";
};

int main()
{
    // Create or open a registry key
    HKEY regKey = nullptr;
    ::RegCreateKeyExW( HKEY_CURRENT_USER, L"_TestKey", 0,  nullptr, 0, KEY_READ | KEY_WRITE, nullptr, &regKey, nullptr );

    // Attempt to write the string member data.value, but pass a value for cbData 
    // that is twice the number of actual bytes
    Test data;
    ::RegSetValueExW( regKey, L"FooBar", 0, REG_SZ, reinterpret_cast<BYTE const*>( &data.user ), sizeof( data.user ) * 2 );
}

By passing twice the actual number of bytes for cbData, the code unintentionally writes the value of the password member after the intended value of the user member into the registry, separated by a null character.

PowerShell code that reads the value:

$hkcu = [Microsoft.Win32.RegistryKey]::OpenBaseKey([Microsoft.Win32.RegistryHive]::CurrentUser, 0)
$regkey = $hkcu.OpenSubKey('_TestKey')
$regkey.GetValue('FooBar')

Output:

MyUserMyPwd

Note that PowerShell strips the null terminator between "MyUser" and "MyPwd" from the output, but if you read the registry value into a variable, the null terminator will be there.


Bonus Code

Out of curiosity, I wrote a script that lists all registry string values that contain embedded null characters (excluding REG_MULTI_SZ values, which may contain embedded null characters by design).

Example:

.\Get-RegStringsWithEmbeddedNull.ps1 -Hive LocalMachine -View Registry64 -EA Ignore
.\Get-RegStringsWithEmbeddedNull.ps1 -Hive LocalMachine -View Registry32 -EA Ignore

On my machine, this lists over 500 values! In many cases the difference in length between the stored string and the actual string (trimmed using -replace '\0.*') is only 1 character (so only an extra null is stored), which makes it especially hard for the unsuspecting developer to diagnose problems when working with such values, because PowerShell doesn't display embedded null characters. The only way to diagnose these off-by-1 errors is by looking at the Length property of the string.

Conclusion:

In general it is a good idea to trim any registry string value of type REG_SZ and REG_EXPAND_SZ at the first null character. There might be cases where embedded null characters are actually intended, but these are rare and against the spec (developer should have choosen REG_MULTI_SZ instead). Most cases seem to be caused by programmer errors. The C APIs are easy to use incorrectly, as some expect you to pass the character count, others expect that you include the null terminator and others require you to pass the buffer size (in characters or even in bytes), which might be larger than the actual string length.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
zett42
  • 25,437
  • 3
  • 35
  • 72
  • 1
    @mklement0 I just learned that PowerShell automatically removes _null_ characters for equality comparison (`-eq`) and - on PS 7.x only - also for `-like`, which mitigates _some_ of the issues caused by embedded _null_ characters in registry values. E. g. `"\`0foo\`0bar\`0" -eq 'foobar'` outputs `True` on both PS 5.1 and PS 7.2.6, while `"\`0foo\`0bar\`0" -like 'foobar'` outputs `False` on PS 5.1 and `True` on PS 7.2.6! Using the `-match` operator outputs `False` on any PS version. – zett42 Oct 04 '22 at 09:07
  • 1
    Good to know, thanks. Note that with `-eq` and `-like` in PS 7.1+ / .NET 5+ it isn't just ``"`0"`` that is ignored, but _all non-printing_ control characters, such as ``"`a"``. [Backstory](https://github.com/PowerShell/PowerShell/issues/14956#issuecomment-792348700). – mklement0 Oct 04 '22 at 14:48