Using batch, how to write unicode into a file?

Question

I want to drag and drop folders/files print out all dirs/files and files of its subfolder recursively into a file.

@echo off
REM chcp 1250
REM chcp 65001

if [%1]==[] goto :eof
:loop
  echo %1 >> aText.txt
  for /f "tokens=* delims=" %%a in ('dir %1  /s /b') do (
    echo %%a >> aText.txt
  )
shift
if not [%1]==[] goto loop

aText.txt

@pause

And that works fine, but it doesn't support Unicode filenames. It also doesn't work, if I save the bat file itself under UTF-8 or Unicode. I have looked at this: Unicode characters in Windows command line - how?

But this doesn't make it work. My guess is, the chcp makes it possible to write unicode in the batch file, and not unicode in the file it creates. How do I get the unicode filenames written into the file it creates?

EDIT:

To re-phrase my question more precisely. I want to write this unicode to be readable by the browser (e.g. Chrome mostly) What I have now is this:

@echo off
chcp 65001

if [%1]==[] goto :eof
:loop
  echo %1 > aText.txt
  for /f "tokens=* delims=" %%a in ('dir %1  /s /b') do (
echo   ^<br^>^<img src='%%a'^> >> aText.txt
REM    echo %%a >> aText.txt
  )
shift
if not [%1]==[] goto loop

aText.txt

@pause

So I open this in notepad, it shows the unicode, all fine. (Just as MC ND describes in the answer) This gives me:

D:\Downloads\unicodes 
  <br><img src='D:\Downloads\unicodes\sdsdsd.html'> 
  <br><img src='D:\Downloads\unicodes\ŽŽŽŽŽ.png'> 
  <br><img src='D:\Downloads\unicodes\中文.png'> 
  <br><img src='D:\Downloads\unicodes\文言.png'> 
  <br><img src='D:\Downloads\unicodes\日本語.png'> 
  <br><img src='D:\Downloads\unicodes\日本語.txt'> 
  <br><img src='D:\Downloads\unicodes\粵語.png'> 
  <br><img src='D:\Downloads\unicodes\한국어.png'>

However, when I open this with Chrome it gets:

D:\Downloads\unicodes 
  <br><img src='D:\Downloads\unicodes\sdsdsd - Kopie.txt'> 
  <br><img src='D:\Downloads\unicodes\sdsdsd.html'> 
  <br><img src='D:\Downloads\unicodes\Å½Å½Å½Å½Å½.png'> 
  <br><img src='D:\Downloads\unicodes\ä¸æ–‡.png'> 
  <br><img src='D:\Downloads\unicodes\æ–‡è¨€.png'> 
  <br><img src='D:\Downloads\unicodes\æ—¥æœ¬èªž.png'> 
  <br><img src='D:\Downloads\unicodes\æ—¥æœ¬èªž.txt'> 
  <br><img src='D:\Downloads\unicodes\ç²µèªž.png'> 
  <br><img src='D:\Downloads\unicodes\í•œêµì–´.png'>

obviously, when I rename the txt file to an html file, there is just a bunch of broken images even for the png files.

When I manually open the txt in notepad and re-save the txt file under a diffrent name, not even changing any of the set encodings (UTF-8), all works fine, as I want it, but I need to get rid of this manual saving.

With npocmaka's CM \u solution I was getting something with spaces inbetween each character, unfortunately I suddenly don't seem to be able to reproduce this after trying around uselessly, and instead with this now:

@echo off
chcp 65001

cmd /u /c for /f "tokens=* delims=" %%a in ('dir %1 /s /b') do ( echo %%a >> aText.txt )

aText.txt

I get

D:\Downloads>(echo D:\Downloads\unicodes\sdsdsd.html   ) 
D:\Downloads\unicodes\sdsdsd.html  

D:\Downloads>(echo D:\Downloads\unicodes\ŽŽŽŽŽ.png   ) 
D:\Downloads\unicodes\ŽŽŽŽŽ.png  

D:\Downloads>(echo D:\Downloads\unicodes\中文.png   ) 
D:\Downloads\unicodes\中文.png  

D:\Downloads>(echo D:\Downloads\unicodes\文言.png   ) 
D:\Downloads\unicodes\文言.png  

D:\Downloads>(echo D:\Downloads\unicodes\日本語.png   ) 
D:\Downloads\unicodes\日本語.png  

D:\Downloads>(echo D:\Downloads\unicodes\日本語.txt   ) 
D:\Downloads\unicodes\日本語.txt  

D:\Downloads>(echo D:\Downloads\unicodes\粵語.png   ) 
D:\Downloads\unicodes\粵語.png  

D:\Downloads>(echo D:\Downloads\unicodes\한국어.png   ) 
D:\Downloads\unicodes\한국어.png

whose double line output despite the echo off in itself is weird for me, but at any rate, in notepad the unicode filesnames are shown, but chrome would not want even want to open the txt, and upon renaming the extention to html, it shows "garbage" as follows:

D:\Downloads>(echo D:\Downloads\unicodes\sdsdsd.html ) D:\Downloads\unicodes\sdsdsd.html D:\Downloads>(echo D:\Downloads\unicodes\}}}}}.png ) D:\Downloads\unicodes\}}}}}.png D:\Downloads>(echo D:\Downloads\unicodes\-N‡e.png ) D:\Downloads\unicodes\-N‡e.png D:\Downloads>(echo D:\Downloads\unicodes\‡eŠ.png ) D:\Downloads\unicodes\‡eŠ.png D:\Downloads>(echo D:\Downloads\unicodes\åe,gžŠ.png ) D:\Downloads\unicodes\åe,gžŠ.png D:\Downloads>(echo D:\Downloads\unicodes\åe,gžŠ.txt ) D:\Downloads\unicodes\åe,gžŠ.txt D:\Downloads>(echo D:\Downloads\unicodes\µ|žŠ.png ) D:\Downloads\unicodes\µ|žŠ.png D:\Downloads>(echo D:\Downloads\unicodes\\Õm´Å.png ) D:\Downloads\unicodes\\Õm´Å.png

which is not what I need...

Assuming you have tried without the `rem` before the `chcp 65001` Exactly, how does it not work? What do you get? What should you get? — MC ND, Oct 30 '13 at 09:29
`cmd /u /c for /f "tokens=* delims=" %%a in ('dir %1 /s /b') do echo %%a )` -? — npocmaka, Oct 30 '13 at 09:35
@MC ND It writes some unreadable mess:D:\Downloads\unicode D:\Downloads\unicode\sdsdsd.txt D:\Downloads\unicode\Å½Å½Å½Å½Å½.png D:\Downloads\unicode\ä¸æ–‡.png D:\Downloads\unicode\æ–‡è¨€.png D:\Downloads\unicode\æ—¥æœ¬èªž.png D:\Downloads\unicode\æ—¥æœ¬èªž.txt D:\Downloads\unicode\ç²µèªž.png D:\Downloads\unicode\í•œêµì–´.png I even tried writing an image tag and show the images inside an img tag in html, but I also only get broken image symbols, so it's probably not just a display problem. — kumoyadori, Oct 30 '13 at 10:18

score 2 · Answer 1 · answered Sep 12 '16 at 21:58

2

I was having this problem with certain wmic commands want to write as unicode characters to the file. Here is how I resolved the problem:

echo %%a |more>> aText.txt

This also works on WinPE, for those who may be interested.

answered Sep 12 '16 at 21:58

Chef Pharaoh

2,387
3
27
38

score 0 · Answer 2 · answered Oct 30 '13 at 11:48

Directory with a file containig unicode characters in filename (∏∏∏∏.txt).

With pagecode 850, dir command show correct filename, but redirection of dir command to file just generates ansi file with ????.txt both from type or notepad

With pagecode 65001, dir command show correct filename, redirection to file generates a utf-8 file, correct displayed with type under pagecode 65001 and "garbage" under pagecode 850. Notepad shows correct values.

With cmd /u (unicode), with pagecode 850 or 65001, dir command shows correct infor, but redirection generates a unicode file (two bytes per character). Type command displays "spaces" between characters in any pagecode. Notepad handles the file without problems.

Solution ? There's no simple solution. Each program/system/display understand diferent things. Determine what will be the final output of the information and make sure all the involved elements, independently of how data is shown in middle stages, allow you to generate the desired output.

Answering your cuestion, to get UNICODE characters inside file, npocmaka comment gives you what you need: start a new cmd instance with /u as parameter, obtaining an unicode command line.

The end product is supposed to be a html file so the unicode has to work in the browser (e.g. Chrome). Notepad displays it correctly, but the browser (which I had the file opened in all the while) still only shows "garbage" and any html tag wouldn't work either. Adding npocmaka comment's tip with cmd /u before the for loop inserts the spaces (not good), as you say, but it still only shows "garbage" for the unicode filename this time even in notepad; the browser wouldn't even want to open the txtand changing extention to html shows the file, but that's still the "garbage" for the filename... — kumoyadori, Oct 30 '13 at 13:38

Using batch, how to write unicode into a file?

2 Answers2

Linked