1

Trying to get an output through cmd with the list of folders and files inside a drive. Some folders are written in cyrillic alphabet so I only get ??? symbols.

My command:

tree /f /a |clip

or

tree /f /a >output.txt

Result:

\---???????????
    \---2017 - ????? ??????? ????
            01. ?????.mp3
            02. ? ???????.mp3
            03. ????.mp3
            04. ?????? ? ???.mp3
            05. ?????.mp3
            06. ???? ?????.mp3
            07. ???????? ????.mp3
            08. ??? ?? ?????.mp3
            Cover.jpg

Any idea?

blfuentes
  • 2,731
  • 5
  • 44
  • 72
  • File and folder names are internally stored in Windows NTFS in 16bit Unicode encoding and they are converted to 8bit OEM encoding when they are output by console applications, such as tree.com. Conversion fails when Cyrilic support is not installed (via ControlPanel/RegionalSettings/Russian). – vitsoft May 28 '17 at 10:27
  • 1
    @eryksun, you may summarise your comments as an answer since you hit the spot, so the OP had got the chance to accept... – aschipfl May 29 '17 at 06:13
  • @eryksun please post it as answer so I can accept it. I tested with the `cmd /u /c "dir /s /b" | clip` and it worked. – blfuentes May 29 '17 at 06:22

1 Answers1

3

tree.com uses the native UTF-16 encoding when writing to the console, just like cmd.exe and powershell.exe. So at first you'd expect redirecting the output to a file or pipe to also use Unicode. But tree.com, like most command-line utilities, encodes output to a pipe or disk file using a legacy codepage. (Speaking of legacy, the ".com" in the filename here is historical. In 64-bit Windows it's a regular 64-bit executable, not 16-bit DOS code.)

When writing to a pipe or disk file, some programs hard code the system ANSI codepage (e.g. 1252 in Western Europe) or OEM codepage (e.g. 850 in Western Europe), while some use the console's current output codepage (if attached to a console), which defaults to OEM. The latter would be great because you can change the console's output codepage to UTF-8 via chcp.com 65001. Unfortunately tree.com uses the OEM codepage, with no option to use anything else.

cmd.exe, on the other hand, at least provides a /u option to output its built-in commands as UTF-16. So, if you don't really need tree-formatted output, you could simply use cmd's dir command. For example:

cmd /u /c "dir /s /b" | clip

If you do need tree-formatted output, one workaround would be to read the output from tree.com directly from a console screen buffer, which can be done relatively easily for up to 9,999 lines. But that's not generally practical.

Otherwise PowerShell is probably your best option. For example, you could modify the Show-Tree script to output files in addition to directories.

Eryk Sun
  • 33,190
  • 5
  • 92
  • 111