1

I have a file with this simple contents:

test.txt (ASCII encoded)

Baby, you can drive my :car:

Via a Windows batch file, I need to change :car: to (https://unicode-table.com/en/1F697/)

I'd like to avoid installing new software on the client's server, so I'm trying to do it using PowerShell or something native.

So far I've tried a ton of suggestions (https://www.generacodice.com/en/articolo/30745/How-can-you-find-and-replace-text-in-a-file-using-the-Windows-command-line-environment?), but nothing works for me. Either it doesn't get replaced, or \u1F697 shows up literally. I've tried changing the inbound file's encoding to Unicode and that isn't working either.

Non-working example:

powershell -Command "(gc test.txt) -replace ':car:', '' | Out-File -encoding Unicode test.txt"

Does anyone have any tips?

Edit: I've determined how to reproduce it.

If I run this line via command line, it works:

powershell -Command "(gc test.txt) -replace ':car:', '' | Out-File -encoding utf8 test-out.txt"

If I put the same line of code inside replace.bat and then execute it, test-out.txt is corrupt.

The batch file is set to UTF-8 encoding. Should something be different?

  • "Either it doesn't get replaced, or \u1F697 shows up literally" - which one is it, with the "non-working" example? – Mathias R. Jessen Apr 09 '21 at 17:25
  • I retested the code to answer your question and magically it worked on my laptop. `:car:` is replaced with the car emoji. But on the client's server, the same command replaces the client's file with `ƒÜù`. – Anthony Gill Apr 09 '21 at 18:00
  • I assume that `ƒÜù` is due to an interpreter reading the file with a different encoding than the file actually is – Nico Nekoru Apr 09 '21 at 18:13
  • That's a good call. I'll have to take a look at what they are using and what the encoding seems to be. – Anthony Gill Apr 09 '21 at 18:47
  • I did more debugging and found a clear way to reproduce it. I've edited my post with the details. @NicoNekoru – Anthony Gill Apr 09 '21 at 19:44
  • 1
    `-replace ':car:', [char]::ConvertFromUtf32(0x1F697)` (as Windows `.bat` script interpreter does not understand neither `utf-8` nor `utf-16`). – JosefZ Apr 09 '21 at 20:10
  • Like @JosefZ said, the interpreter for batch interprets not as UTF but as Unicode. Most of the time this is ok since unicode and utf-8 both use ASCII for single bytes. In this case since the car is a 4 byte character in both Unicode and UTF-8 there are differences – Nico Nekoru Apr 09 '21 at 20:14
  • @JosefZ You're answer was correct + first, so I'd like to give you credit for it. If you want to post it as an answer, I'll flag it as the winner. Thanks for the tip! – Anthony Gill Apr 12 '21 at 13:48

2 Answers2

1

I don't think a .bat file can have non-ascii encoding. If you're willing to have a file.ps1 file:

(gc test.txt) -replace ':car:', '' | Out-File -encoding utf8 test-out.txt

The file has to be saved as utf8 with bom in notepad, not just utf8.

Then your .bat file would be:

powershell -file file.ps1

The powershell ise is a nice way to test this.

cmd /c file.bat
type test-out.txt


js2010
  • 23,033
  • 6
  • 64
  • 66
0

Windows .bat script interpreter does not understand any Unicode encoding (e.g. utf-8 or utf-16 or utf-16); the simplest principle is:

You have to save the batch file with OEM encoding. How to do this varies depending on your text editor. The encoding used in that case varies as well. For Western cultures it's usually CP850.

To use any Unicode character (above ASCII range) as a part of string passed to PowerShell command then (instead of '') apply the .NET method Char.ConvertFromUtf32(Int32); in terms of PowerShell syntax [char]::ConvertFromUtf32(0x1F697)

Being in ASCII it does not contradicts with above .bat encoding rule, and PowerShell would evaluate it to the character…

Then, your line could be as follows:

powershell -Command "(gc test.txt) -replace ':car:', [char]::ConvertFromUtf32(0x1F697) | Out-File -encoding Unicode test.txt"
JosefZ
  • 28,460
  • 5
  • 44
  • 83