3

I need from time to time to make a list of files from a CD I obtain. These filenames frequently contain characters in other writing systems, like Cyrillic Russian (Добродошли.doc) or simplified/traditional Chinese (孟子譯注.pdf). My computer (running under Windows 7, Polish) displays the file names correct, opens the files, saves the files in other locations, lets the files be edited by various software etc. Yet when I am making a directory list (with the dir command) I always obtain question marks and other strange characters instead of the other (Chinese, Russian) characters - the output of the dir command by default seems to be written in ANSI by default - instead of Unicode / UTF8.

Example:

02.03.09 21:13 15˙584˙500 ??????(??????).pdf     = these three files were in Chinese
02.03.09 03:11 18˙638˙982 ????(???).pdf
24.03.08 17:25 61˙141˙454 ???®????Ż(???).pdf 
18.03.13 16:00 1˙088 ????.txt                    = this file's name was in Russian 
02.03.09 21:20 26˙083˙641 Transformations-of-Ming.pdf

(obtained with the Windows Right-Click Context Menu "Print Directory Listing")

I have tried searching for various advice to this problem, and none of the solutions offered has solved my problem - alternatively, I have found (seemingly dated) answers that the problem for the time being cannot be solved under various environments. Maybe something has changes - maybe there is a solution, straight away or after having changed something within the registry of Windows? Or else, if there is no simple batch programming solution - maybe there is some ready software that I could download (for free or buying) to solve my problem?

Ken White
  • 123,280
  • 14
  • 225
  • 444
Kasia Luiss
  • 31
  • 1
  • 3
  • The Command Prompt does not correctly display Unicode characters out of the box. Rest assured your app is working correctly. – Cody Gray - on strike Apr 16 '13 at 05:51
  • See: [What encoding/code page is cmd.exe using](http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using) – Endoro Apr 16 '13 at 05:53
  • @Endoro - I've seen it, checked all possible codepages (originally I had 852, but even changing it to Cyrillic-specific cp's (855, 860, 1251) or to Unicode (65001) did not change the situation for Russian, not to say Han. Trying the UTF16 little or big endian cp (1201 or 1200 - the numbers taken from http://en.wikipedia.org/wiki/Code_page) - resulted in the error message "Incorrect code page". Which cp should I use? Maybe I need to change it forever to boot with it, as "programs (except Cmd.exe) that you started before assigning the new code page use the original code page"? But dir is cmd.exe – Kasia Luiss Apr 16 '13 at 07:04
  • @CodyGray - I'm not sure if I catch your idea. Which app do you mean - cmd.exe? It is the application that is running "dir" batch command, and it is the only one with which I have problem. And by "working correctly" you understand "not correctly displaying Unicode characters out of the box", right? So you mean that there is no solution whatsoever to this problem? – Kasia Luiss Apr 16 '13 at 07:10
  • Can't answer your question. Have no experience with chinese, only russian :( – Endoro Apr 16 '13 at 07:43
  • @Endoro - So for Russian, but to include also Polish (which is my mother tongue), what would you reccomend? a) which cp? b) changing cp while running or changing it "forever" with some configuration? How do I miss old DOS config.sys file! ;( – Kasia Luiss Apr 16 '13 at 07:52
  • What I mean, if I find one solution for Russian, then I will only have to look for another one for Chinese. Half of my problems solved (even if only 1/3, because there are some Polish files, but usualy the Polish special letters are converted into basic Latin ones). It's obviously less time consuming to run the "dir" twice (thrice), changing the cp in between (even if rebooting the machine), and then just to compare the files (with Word?), than to enter all those names by hand (even copying and pastying from the Windows Explorer to Notepad or Word requires doing files one by one). – Kasia Luiss Apr 16 '13 at 07:59
  • I was doing it like that before, with a few files on a CD/falsh drive it works well, but recently I have received a CD with 384 Chinese files in 8 Chinese folders with several subfolders and sub-subfolders (all in Chinese) and that has made me desparate to look for a solution. A similar thing with a Russian CD, only about 200 files, I passed through a year ago. – Kasia Luiss Apr 16 '13 at 08:02
  • 1
    The console runs in a legacy OEM code page, it won't be able to display these glyphs. Only shot you have is "CHCP 65001" to switch to utf-8 and pick another font, like Consolas or Lucinda. Chinese is still going to be a problem, these fonts don't have the glyphs for it. Windows Explorer of course won't have this problem, recommended. – Hans Passant Apr 16 '13 at 11:41
  • I removed a bunch of irrelevant content from your post. This is not a chatroom or forum, so your life's history is not needed. If you want to share it, put it in your profile. This site is for technical questions only, and anything not directly related to information needed to ask that question is unnecessary noise. – Ken White Apr 05 '22 at 01:14

4 Answers4

0

First, you need a True Type font (TTF) with foreign character support. Install the font:

  • right mouse click on the title bar
  • choose Properties/Fonts
  • if your font is NOT in the list, you must add it before in the registry with the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont\

After installing the font the console displays all foreign characters supported by this font. Commands like dir or redirection to a file dir > dir.txt does also work. And there is no need to change the codepage with chcp.

Endoro
  • 37,015
  • 8
  • 50
  • 63
  • Thank you for your advice. I was changig the cp because you have directed me there and it seemed logical due to my DOS experience. I have checked that I have Lucida Console and Lucidas as two ttf fonts installed in the cmd console. Both have Cyrillic letters in them, yet neither the display goes well nor the export with dir > dir.txt (both have question marks instead of bukvy). Don't know why. I have added third font DejaVue Sans Mono, as described with more details by http://cristianadam.blogspot.tw/2009/11/windows-console-and-true-type-fonts.html, which also has Cyrillic, but it didn't help – Kasia Luiss Apr 16 '13 at 11:01
  • I don't know which ttf font to use, I have hundreds of them on my computer. Which font have you used?I have read "Necessary criteria for fonts to be available in a command window" (http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q247815) and I stopped before addung an Asian font (too many details to be checked before), yet my experience with Cyrillic makes me doubt if it will work. The display of Cyryllic, Chinese and other characters (as with files arriving with spam - Japanese, Arabic...) is provided in Windows, yet not in command box. I will write again when I do it, maybe tomorrow. – Kasia Luiss Apr 16 '13 at 11:06
  • I had two fonts "Lucida Console" and "Consolas" - I have made an error. Now I have removed them both from registry and have set DejaVu Sans Mono as the default and the Cyrillic letters appeared in the command box, yet the file dir.txt contains ??? instead of Russian letters (checked with Notepad, WordPad and MS Word; change of font doesn't make any influence). Helpless! – Kasia Luiss Apr 16 '13 at 11:31
  • I took "Lucida Console", it works with Russian. I can create files with cyrrilic file names, list them with `dir`, save the filenames in a file and display this file with the ` type` command. Please believe me, otherwise I can take a screenshot. – Endoro Apr 16 '13 at 11:38
  • why does the console codepage effect the data sent to the pipe (redirection)? is there a dir listing utility that disregards the console codepage when using a pipe? – n611x007 Dec 02 '14 at 10:23
0

Setting chcp 65001 solved issue for me. The problem was in wrong code page, at least in my case.

Moonwalker
  • 2,180
  • 1
  • 29
  • 48
0

Use administrator CMD command prompt"

Method: Press on SHIFT key and RIGHT click with your mouse pointer over the target folder. And choose "open command prompt here" in the drop-down menu. You folder will now be under Administrator access.

In the command prompt window at the command prompt where the text cursor is blinking"

type "chcp 1251" (without the quotes) and press ENTER key to change how command prompt displays contents of directory. This is really the Change Code Page Command an internal change of Windows, you will not see any unicode Cyrillic or Chinese font text as yet.

Now type "dir/w" and press ENTER key this will display all the contents in the target directory.

Right-click and choose "Select All" from drop-down menu. Again press the ENTER key. This copies all text the contents which is now displayed in the command window onto your windows clipboard. Do not close command prompt window as you have some Windows house cleaning to complete.

Launch Notepad.exe ( The Basic windows text editor) or Notepad++ the programmer's favorite text editor ( you can Google and download a copy for free use) and paste your clip board contents into the Editor of your choice. You will see the Cyrillic and Chinese text of the directory contents displayed in their correct fonts. But save the text file with Unicode encoding option.

To return to the old Change Page code native to Windows which is Western European Latin:

type "chcp 1252" at the command prompt and press ENTER key. Close command prompt window.

cigien
  • 57,834
  • 11
  • 73
  • 112
-1

Using Windows'95, run the File Manager, (think = that, same as used + Windows 3.1), it be in the Windows folder, then go to the option to change typeface, & choose a Cyrillic 1. Now 1 may see Russian names of files. Window XP, @ DOS promp use CHCP 866.