There must be used the code page 65001 to change the character encoding used by the Windows Command Processor cmd.exe
on interpreting the bytes in the file List.txt
from the default OEM code page according to country/region of the used account to the Unicode encoding UTF-8.
The characters in the batch file are processed then also according to the Unicode encoding UTF-8 for all lines read and parsed by the Windows command interpreter after the execution of the command to change the code page respectively the bytes interpretation in text files of any kind to UTF-8. That does not matter if the batch file contains only ASCII characters as these characters are encoded identical with UTF-8 and Windows-1252 used usually for a batch file edited in a Windows graphical text editor in North American and Western European countries instead of the OEM code page really expected by cmd.exe
on processing the batch file.
65001
is not really a code page because of UTF-8 encoding is not a single byte per character encoding using a code page for the 256 possible code points. It is a variable-length character encoding on which characters are encoded with one two four bytes depending on the character.
Please look on the Microsoft documentation page with the Code Page Identifiers for a list of "code page" numbers to define the character encoding to use on processing bytes of a text file like a list file or a batch file and write characters into a text file as done here also for the file ArtistCount.txt
.
The batch file could be as follows:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
if exist "%USERPROFILE%\Desktop\List.txt" goto ProcessList
echo ERROR: File "%USERPROFILE%\Desktop\List.txt" does not exist.
echo(
pause
exit /B
:ProcessList
for /F "tokens=*" %%G in ('%SystemRoot%\System32\chcp.com') do for %%H in (%%G) do set /A "CodePage=%%H" 2>nul
%SystemRoot%\System32\chcp.com 65001 >nul 2>nul
(for /F "usebackq eol=| delims=" %%I in ("%USERPROFILE%\Desktop\List.txt") do (
if exist "M:\Artists\%%I" (
echo FOUND %%I
dir "M:\Artists\%%I" /A-D /B /S | %SystemRoot%\System32\find.exe /C /V ""
) else (
echo MISSING %%I
echo 0
)
))>"%USERPROFILE%\Desktop\ArtistCount.txt"
%SystemRoot%\System32\chcp.com %CodePage% >nul 2>nul
endlocal
There is first defined completely the required execution environment which is:
- command echo mode turned off,
- command extensions enabled,
- delayed variable expansion disabled.
Delayed variable expansion is disabled as not needed here. File names with one or more exclamation marks would not be correct processed on having enabled delayed variable expansion although not even needed.
If the list file exists as expected, the current code page number is determined next and remembered with the environment variable CodePage
to change the code page finally back to the original one. That is recommended for a batch file used by others perhaps from within a command prompt window on which the user runs next other commands or on batch file called from another batch file. See Compo´s DosTips forum post Saving current codepage for more details about the command line to determine current code page number.
The next command line changes the character encoding to UTF-8. The other lines in the batch file are read now by cmd.exe
with using UTF-8 as well as the lines in the list file.
Then a command block is defined and everything output by echo
or find.exe
in that command block is redirected into an always newly created file ArtistCount.txt
. The Windows Command Processor creates for that reason the text file now before running the FOR loop. All lines output to STDOUT
(standard output stream) are appended to permanently opened text file which is finally closed after the FOR loop processing finished. That method to redirect output of a FOR loop into a text file is much better than the usage of >>
multiple times inside the FOR loop because of the latter results in background on opening the text file, seeking to end, appending the text, closing the text file for each output to write into the file. It causes therefore lots of file system accesses and it could even happen that cmd.exe
fails from time to time to append more text to the output file because of another application like an anti-virus application opens the modified file for scanning between a file close and next file open by cmd.exe
which prevents cmd.exe
to open the file for appending next output.
There is one more advantage of the method to open the output file before FOR loop processing starts and close it finally after FOR loop processing finished: The output file name is just once in the batch file.
The disadvantage is that there is created the text file even on nothing written into the file at all resulting in an empty output file. There could be added the following command line to delete the output file on being finally empty.
if exist "%USERPROFILE%\Desktop\ArtistCount.txt" for %%I in ("%USERPROFILE%\Desktop\ArtistCount.txt") do if %%~zI == 0 del "%USERPROFILE%\Desktop\ArtistCount.txt"
The original code page is restored after the FOR loop.
The initial execution environment as defined outside of the batch file is explicitly restored with the command endlocal
although the Windows Command Processor does that automatically implicit whenever the batch file processing ends like on execution of command exit /B
on list file not existing.
To understand the commands used and how they work, open a command prompt window, execute there the following commands, and read the displayed help pages for each command, entirely and carefully.
chcp /?
del /?
dir /?
echo /?
endlocal /?
exit /?
find /?
for /?
goto /?
if /?
pause /?
set /?
setlocal /?