1

On recent versions of Win10 it is possible to set the Active Code Page (ACP) to a UTF-8 code page. And as discussed here, it is possible to set the System Locale (used to map between the "A" version and "W" version of the Windows API) to use the UTF-8 code page.

How does a script detect if the UTF-8 code page is in use?

As discussed here and here, it is normally possible to use WMI to get the system code page ID:

For Each os In wmi.ExecQuery("SELECT * FROM Win32_OperatingSystem")
    cs = os.CodeSet
Next

When I try that on Win10, set to use 'beta' utf-8 support in American English for non-unicode programs, WMI continues to report that the code page is 1252. Even though that is clearly not the case (code page 1252 has a code point at 128, but none at 49800: UTF-8 has a code point at 49800, none at 128).

How does a script detect that the actual system locale is using the UTF-8 code page?

david
  • 2,435
  • 1
  • 21
  • 33
  • Does this answer your question? [Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)](https://stackoverflow.com/q/57131654) – user692942 Nov 25 '20 at 09:54
  • 1
    @Lankymart That question seems to be asking how to force the PowerShell window to use UTF-8. This question, rather, seems to be asking how to tell what a local system is using. – TylerH Nov 25 '20 at 19:21
  • @TylerH fair enough, but it’s more the fact tags the question both powershell and vbscript. The answer is purely powershell and has been accepted as the solution. – user692942 Nov 26 '20 at 05:43
  • Almost anything you can do in powershell, you can do in vbscript, and vice versa. Most of powershell is just calling COM objects, most of the exceptions are available with COM shells or as executables, and most of the few remaining exceptions have equivalents. The answer below is not a PowerShell answer: it's a PowerShell example of a generic answer. – david Nov 26 '20 at 23:56

1 Answers1

5

PowerShell (shell-based) solutions:

To determine the system locale's (system-wide) OEM code page - which is the one used by console applications, use the registry:

# $true, if the OEM code page is set to UTF-8 (code page 65001)
'65001' -eq (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage OEMCP)

Note:

  • Using the system-wide UTF-8 support also sets the ANSI code page (ACP) to 65001, used by legacy GUI applications but notably also Windows PowerShell[1], means that Windows PowerShell's default encoding for the Get-Content and Set-Content cmdlet, for instance, changes.

  • From cmd.exe, you could run
    reg.exe query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v OEMCP, but you'd then have to parses its textual output to extract just the code page number.

  • Note that, regrettably, PowerShell's Get-WinSystemLocale cmdlet cannot be used as of this writing, because the [cultureinfo] instance it returns does not reflect a UTF-8 override that may be in place - see this ServerFault answer.


To determine the current console's active OEM code page - which may or may not reflect the system locale's, because console windows can be configured to use custom code pages, and the code page could even have been changed in-session beforehand:

# $true, if the OEM code page is set to UTF-8 (code page 65001)
65001 -eq [Console]::OutputEncoding.CodePage

Note:

  • From cmd.exe you could execute chcp chcp.com, but you would then have to parse its textual output to extract just the code-page number

Windows API-based solution:

From a compiled application, you can use the the GetACP() and GetOEMCP() Windows API functions to query the active ANSI and OEM code page, respectively.

You could even do that from PowerShell (though the fact that it requires on-demand compilation makes the registry solution at the top preferable):

# Compile a helper type that calls the WinAPI functions.
Add-Type -Namespace Util -Name WinApi -MemberDefinition @'
  [DllImport("Kernel32.dll")]
  public static extern uint GetACP();
  [DllImport("Kernel32.dll")]
  public static extern uint GetOEMCP();
'@

[Util.WinAPI]::GetOEMCP(), [Util.WinAPI]::GetACP()

Note:

  • If your compiled application is a console application and you want to know the associated console's current OEM code page - which may or may not be the default page set via the system locale - use the GetConsoleOutputCP() function instead.

[1] The active ANSI code page is no longer relevant to PowerShell [Core] v6+, which consistently uses BOM-less UTF-8 for its cmdlets, but on Windows the active OEM code page, as reflected in [Console]::OutputEncoding, still matters when communicating with external programs.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Where's my original comment? No, `chcp.com` is just a program: if executed i.e. via `CreateProcess()` and redirecting its output right away then no console should be involved at all. However, one could also call [`GetACP()`](https://learn.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getacp) right away. – AmigoJack Nov 22 '20 at 11:28
  • @AmigoJack: From a compiled application, you wouldn't incur the overhead of creating a child process for `chcp.com` - indeed you would use the WinAPI. Note that the question is about _scripting_ (as also reflected in the tags [powershell] and [vbscript]), so it's fair to assume that a _console_ is involved, where, as discussed, the output from `chcp` may or may not reflect the true system locale. However, given that situationally you may want to know the console's _current_ code page, I've updated the answer to address both use cases, including considerations for compiled applications. – mklement0 Nov 22 '20 at 13:56
  • 1
    Ah, now I understand about my cmt. Also I'm now convinced your answer is more verbose and complete than before. Didn't knew PowerShell needs on-demand compilation for DLL imports - thought it would be "intelligent" enough to just use it. Of course, then querying the Registry should have the lowest performance impact. – AmigoJack Nov 22 '20 at 20:35
  • @AmigoJack: Not that I'd complain if PowerShell were to let you call _Windows API_ functions _directly_ (which, given PowerShell's cross-platform nature, would obviously be limited to _Windows_), but I think it's remarkable enough for it, as a _shell_, to allow near-unlimited, direct access to the _.NET_ APIs (on _all_ supported platforms). – mklement0 Nov 22 '20 at 21:41