1

I'm putting together a script and need to take a file's content as input for setting a variable. I'm using Out-File to produce a text file:

$string | Out-File -FilePath C:\Full\Path\To\file.txt -NoNewLine

Then I am using that file to set a variable in batch:

set /P variablename=<C:\Full\Path\To\file.txt

The content of that file is a unique id string that looks practically like this:

1i32l54bl5b2hlthtl098

When I echo this variable, I get this:

echo %variablename%
■1

When I have tried a different string in the input file, I see that what is being echoed is the ■ character and then the first character in the string. So, if my string was "apfvuu244ty0vh" then it would echo "■a" instead.

Why isn't the variable being set to the content of the file? I'm using the method from this stackoverflow post where the chosen answer says to use this syntax with the set command. Am I doing something wrong? Is there perhaps a problem with using a full path as input to a set variable?

Dusty Vargas
  • 863
  • 6
  • 17
  • 1
    I would assume the file is not saved as ASCII text. It is most likely some form of unicode with a BOM. – Squashman Aug 02 '18 at 23:20
  • Notepad++ tells me it is encoded as UCS-2 LE BOM. This file is being produced by Out-File in powershell, so I will explorer any options I have for encoding and report back. – Dusty Vargas Aug 02 '18 at 23:24
  • Yes. You can tell [Powershell to output the file as ascii](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/out-file?view=powershell-6). Could we see the Powershell code please. – Squashman Aug 02 '18 at 23:26
  • That did it. Thanks a bunch, I've added the powershell info (should've added to being with). Would you give an answer about the encoding so I can accept? – Dusty Vargas Aug 02 '18 at 23:30

1 Answers1

4

tl;dr:

Use Out-File -Encoding oem to produce files that cmd.exe reads correctly.

This effectively limits you to the 256 characters available in the legacy "ANSI" / OEM code pages, except NUL (0x0). See bottom section if you need full Unicode support.


In Windows PowerShell (but not PowerShell Core), Out-File and its effective alias > default to UTF-16LE character encoding, where most characters are represented as 2-byte sequences; for characters in the ASCII range, the 2nd byte of each sequence is NUL (0x0); additionally, such files start with a BOM that indicates the type of encoding.

By contrast, cmd.exe expects input to use the legacy single-byte OEM encoding (note that starting cmd.exe with /U only controls the encoding of its output).

When cmd.exe (unbeknownst to it) encounters UTF-16LE input:

  • It interprets the bytes individually as characters (even though characters in UTF-16LE are composed of 2 bytes (typically), or, in rare cases, of 4 (a pair of 2-byte sequences)).

  • It interprets the 2 bytes that make up the BOM (0xff, 0xfe) as part of the string. With OEM code page 437 (US-English) in effect, 0xff renders like a space, whereas 0xfe renders as .

  • Reading stops once the first NUL (0x0 byte) is encountered, which happens with the 1st character from the ASCII range, which in your sample string is 1.

Therefore, string 1i32l54bl5b2hlthtl098 encoded as UTF-16LE is read as  ■1, as you state.


If you need full Unicode support, use UTF-8 encoding:

  • Use Out-File -Encoding utf8 in PowerShell.

  • Before reading the file in cmd.exe (in a batch file), run chcp 65001 in order to switch to the UTF-8 code page.

Caveats:

  • Not all Unicode chars. may render correctly, depending on the font used in the console window.

  • Legacy applications may malfunction with code page 65001 in effect, especially on older Windows versions.

    • A possible strategy to avoid problems is to temporarily switch to code page 65001, as needed, and then switch back.

Note that the above only covers communication via files, and only in one direction (PowerShell -> cmd.exe).
To also control the character encoding used for the standard streams (stdin, stdout, stderr), both when sending strings to cmd.exe / external programs and when interpreting strings received from them, see this answer of mine.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Great and informative answer. I used Out-File -Encoding ASCII as suggested by Squashman in a comment, which worked for me but I'm sure I'll come back to this answer in the future. Thanks. – Dusty Vargas Aug 03 '18 at 18:41
  • @OilyBusiness: Glad to hear it was helpful; `-Enocding ASCII` works fine, as long as you use only characters from the 7-bit ASCII range, but if you have accented characters (such as `ï`, for instance), they'll be replaced with _literal_ `?` - i.e., you'll lose information. `-Encoding oem` extends support to the code pages ("ANSI"/OEM) in effect based on the legacy system locale, so you'll get support for another 128 chars. (the 8-bit range), including some accented chars. Full Unicode support requires changing the code page, as stated. – mklement0 Aug 03 '18 at 18:48