1

I have a batch script which does a curl command using:

powershell -Command "curl.exe -k -S -X GET -H 'Version: 16.0' -H 'Accept: application/json' 'https://IntendedURL' > result.json".

I am then using the result.json as an input file to another program.

The issue I am facing is that running the batch script results in a result.json file with format UTF-16 LE BOM. With UTF-16 LE BOM format, the .json file cannot even be opened and observed directly using a normal browser. It has exception of "SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data".

I did some searching around and the answer that came back is that curl should not result in BOM encoding being saved. Nonetheless, I ended up with result.json which is with BOM encoding. If there is something that can be done to the batch script to make it output without BOM, help is appreciated.

For the second part of my issue, I can manually copy the content of result.json and save it in a new text document and specifically choose UTF-8 format. I searched around as well and found a bit of a solution like here: Batch script remove BOM () from file , but the solution script to remove the BOM seems rather complicated.

Thus, any help to remove the BOM encoding without resorting to manually copying the content and saving it in the specific desired format is appreciated as well.

Best regards, cyborg1234

cyborg1234
  • 11
  • 2
  • 1
    Read following : https://stackoverflow.com/questions/56724398/windows-encoding-changed-when-redirecting-output – jdweng Jul 28 '23 at 12:38
  • @jdweng, the linked post is about _Python's_ character-encoding choices, which do not apply here. Here, it is _PowerShell_'s character-encoding behavior that is the problem. – mklement0 Jul 28 '23 at 18:05
  • @mklement0 : The issue is Python and the solution in the link to save the stdeout to an utf-8 file seems reasonable. – jdweng Jul 28 '23 at 18:58
  • @jdweng, if you tried to use _PowerShell_ to save to a UTF-8 file (which is what this question is about), you'd end up with (a) a character-encoding mismatch and misinterpreted non-ASCII characters and (b) in _Windows PowerShell_ with a UTF-8 file _with a BOM_, both of which must be avoided here. Nothing in the linked post is remotely of any help with these issues. – mklement0 Jul 28 '23 at 19:03

1 Answers1

0

Note:

  • See the bottom section for a way to avoid the problem by calling curl.exe directly from your batch file.

In PowerShell, > is (in effect) an alias of the Out-File cmdlet, whose default in Windows PowerShell is UTF-16LE (in PowerShell (Core) 7+, it is now the more sensible (BOM-less) UTF-8, across all cmdlets).

You have two options (spoiler alert: choose the 2nd):

  • In principle, you can use Out-File explicitly (or preferably, with text input, Set-Content), in which case you can use its -Encoding parameter to control the output character encoding. However:

    • In Windows PowerShell -Encoding UTF8 invariably creates a UTF-8 file with a BOM. Short of using .NET APIs directly, the only way to create BOM-less UTF-8 files in Windows PowerShell is to use New-Item - see this answer.

    • Even clearing that hurdle is typically not enough, and the following also affects PowerShell (Core) up to at least v7.3.x:

      • When PowerShell captures output from external programs it currently invariably decodes them into .NET strings first - even when redirecting to a file - using the character encoding stored in [Console]::OutputEncoding, and then re-encodes on output, based on the specified or default character encoding of the cmdlet used.

        • However, there's hope for PowerShell (Core): an experimental feature named PSNativeCommandPreserveBytePipe is now available in preview versions of the upcoming v7.4 release, which treats > and | when applied to external (native) programs as raw byte conduits, i.e. it bypasses the usual decoding and re-encoding cycle in favor of passing the raw data through.
          Note that, as with any experimental feature, it isn't guaranteed to become a stable feature.
      • If that encoding doesn't match the character encoding of what curl.exe outputs, the output will be misinterpreted - and given that [Console]::OutputEncoding defaults to the legacy system locale's OEM code page, misinterpretation will occur if the output is UTF-8-encoded, so - unless you've explicitly configured your system to use UTF-8 system-wide (see this answer) - you'll need to (temporarily) set [Console]::OutputEncoding = [Text.UTF8Encoding]::new() before the curl.exe call.

  • Preferably, let curl.exe itself write the output file, i.e. replace > result.json with -o result.json: this will save the output as-is to the target file.


Taking a step back:

  • You can make your curl.exe call directly from your batch file, which avoids the character-encoding problems.

  • cmd.exe's > operator, unlike PowerShell's is a raw byte conduit, so no data corruption should occur - that said, you're free to use -o result.json instead.

  • When calling from cmd.exe (a batch file), only "..." quoting is supported, so the '...' strings have been transformed into "..." ones below.

:: Directly from your batch file, using "..." quoting:
curl.exe -k -S -X GET -H "Version: 16.0" -H "Accept: application/json" "https://IntendedURL" > result.json
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Hi @mklement0, I have tried the method of changing ">" to "-o" and the method of calling curl.exe directly without the "powershell -Command". Both methods result in result.json of the format UTF-8 which seems to be what I want to achieve. However, when I open this UTF-8 result.json file in a normal browser, the exception of "SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data" still shows. And when I manually save the content of the file into new UTF-8 text file, the exception occurs too, which differs from when I saved the content from UTF-16 LE BOM manually. – cyborg1234 Aug 01 '23 at 05:33
  • @cyborg1234, so you now have a UTF-8 file without a BOM, but the file doesn't contain valid JSON, due to characters at the beginning? If you inspect the file visually in a text editor, does it look like valid JSON? Is `ConvertFrom-Json` able to parse it? Here's an example of content that would break JSON parsing: `'x' | ConvertFrom-Json` – mklement0 Aug 01 '23 at 12:51