0

When I use "for" to read each line from 1.txt in UTF-8 format, it will be garbled. How to get the batch to correctly recognize UTF-8 encoded files?

for /F "tokens=*" %%f in (1.txt) do echo %%f
pause
user69485
  • 13
  • 5
  • Top of the script run `chcp 65001` – Gerhard Nov 07 '19 at 07:07
  • Then keep your fingers crossed for font used for console window supporting the Unicode encoded characters, see [Using another language (code page) in a batch file made for others](https://stackoverflow.com/a/48982681/3074564). Further I recommend to use `delims=` (turns off line splitting behavior) instead of `tokens=*` (line splitting is done resulting in removing leading spaces/tabs) and use not `f` as loop variable although possible, but for example `L` or `I` or `#` which are characters not used for modifiers explained by the help output on running `for /?` in a cmd window. – Mofi Nov 07 '19 at 07:27
  • 1
    BTW: Is there any reason not using command `type`? Get help on this command with running in a cmd window `type /?`. – Mofi Nov 07 '19 at 07:29
  • In addition to adding chcp 65001, you must also set the CMD font, otherwise it will prompt The system cannot write to the specified device. – user69485 Nov 07 '19 at 09:35
  • In the actual batch, I need to use for to read each line from the file as a parameter to another cli. So type does not apply. – user69485 Nov 07 '19 at 09:39

1 Answers1

0

Use this:

for /F "tokens=* delims= " %%f in ('type 1.txt') do echo %%f

This will really work because type command reads lines from a text file, no matter what encoding is it.

Biffen
  • 6,249
  • 6
  • 28
  • 36
Wasif
  • 14,755
  • 3
  • 14
  • 34
  • This does not work if the file `1.txt` contains non-ASCII characters UTF-8 encoded which should be displayed correct in console window. If the file `1.txt` contains only ASCII characters, there would be binary no difference between OEM/ANSI/ASCII encoded text file and UTF-8 encoded text file without byte order mark (BOM). So the usage of `type` is irrelevant for the issue getting non-ASCII characters not correct displayed in console window. – Mofi Nov 07 '19 at 09:01
  • Also if that would work, it would be enough to use just `type 1.txt` instead of using command `for` to start a new command process in background with `%ComSpec% /c type 1.txt`, output to handle __STDOUT__ with `type` the text with conversion from UTF-8 to OEM code page according to configured country for the used account, capture that output by `cmd.exe` executing `for`, remove from all non-empty lines and lines not starting with a semicolon ignored by `for` all leading spaces and output the remaining line in console window of command process which is processing the batch file. – Mofi Nov 07 '19 at 09:06