2

I am using a little processconf.js tool to build a configuration.json file from multiple .json files.

Here the command I am using :

node processconf.js file1.json file2.json > configuration.json 

I was using cmd for a moment, but today I tried using Powershell and somehow from the same files and the same command I do have different results.

One file is 33kb(cmd) the other 66kb(powershell), looking at the files they have the exact same lines and I can't find any visual differences, why is that ?

Logan Wlv
  • 3,274
  • 5
  • 32
  • 54

1 Answers1

6

PowerShell defaults to UTF16LE, while cmd doesn't do Unicode by default for redirection (which may sometimes end up mangling your data as well).

If you don't use the redirection operator in PowerShell but instead Out-File you can specify an encoding, e.g.

node processconf.js file1.json file2.json | Out-File -Encoding Utf8 configuration.json

I think -Encoding Oem would be somewhat the same as the cmd behaviour, but usually doesn't support Unicode and there's a conversion involved.

The redirection operator of course has no provisions for specifying any options, so it's often not the best choice when you care about the exact output format. And since PowerShell, contrary to Unix shells, handles objects, text and random binary data are very different things.

You'd get the same behaviour from cmd if you ran it with cmd /u, by the way.

Joey
  • 344,408
  • 85
  • 689
  • 683
  • 1
    It's worth noting that `Out-File -Encoding Utf8` will write an UTF-8 BOM to the file, and that other consumers of the file might choke on that. This is especially relevant to JSON files as they are more likely to be transferred to other computers. – Tomalak Oct 22 '18 at 14:17
  • Good answer, but it's `-Encoding OEM` that's the equivalent of `cmd.exe`'s default behavior. In PSv5.1+ you can change the default encoding for `>` with something like `$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'` – mklement0 Oct 22 '18 at 14:22
  • The optimum would be an UTF-8 encoded file without BOM, but that's not straight-forward to produce in PowerShell code. It involves using a BOM-free encoding instance (`$Utf8NoBOM = New-Object System.Text.UTF8Encoding $False`) and writing the file not with `Out-File` but with `[System.IO.File]::WriteAllText($path, $text, $Utf8NoBOM)`. – Tomalak Oct 22 '18 at 14:23
  • 1
    @Tomalak: Yeah, that's a solution that I usually use. When using PowerShell as some sort of `sed` or other processing tool for text files it's remarkably cumbersome because I tend to go through `[File]::ReadAllText` and `[File]::WriteAllText` (or even treat the file just as Latin1 and read as one single string) to avoid changing too much (not nice for diffs in source control and subsequent merges). – Joey Oct 22 '18 at 14:42
  • @Tomalak WriteAllText will write the file into one single line right, I may lose \n \t characters right ? – Logan Wlv Oct 22 '18 at 16:03
  • @LoganWlv No, `WriteAllText` will write the string *exactly* as you pass it. – Tomalak Oct 22 '18 at 16:23