I have a Windows Perl (5.16.1 32 bit) program that opens a media file and (using ffmpeg) it extracts segments of audio - the purpose of which is to convert a single album music track (containing multiple songs) into multiple individual song files.
When the name of the media file to be processed is all ASCII characters, this all works rather well. I recently tried this program against a filename that includes Russian characters, and the program fails miserably in several areas.
While this must have to do with Unicode, and as I have never previously needed to do anything with Unicode - I am rather confused about the various aspects of failures that I am experiencing here, nor do I know the fix for the variety of issues I am now facing.
I have distilled this down to the minimum to demonstrate the problems.
If I open a cmd window, and type 'chcp', the return value is 437.
If I do a 'dir' command, this is what is shown for me:
04/01/2019 11:46 AM 71,982,427 IC3PEAK альбом Сладкая.mkv
06/10/2020 10:42 PM 275 test.pl
(Note how in my cmd window, the Russian characters do display as Russian characters.)
My 'test.pl' Perl script is here:
use open ":std", ":encoding(UTF-8)";
$media = "IC3PEAK альбом Сладкая.mkv";
if (-e $media) {
print "Media file does exist\n";
} else {
print "Media file does NOT exist\n";
}
open(IN, $media) || die "Media file ($media) can not be opened!\n";
When this Perl script runs, using default chcp value of 437, I get this as output:
Media file does NOT exist
Media file (IC3PEAK альбом Сладкая.mkv) can not be opened!
If I run 'chcp 1250' in my cmd window, and I re-run this Perl script, I get this as output:
Media file does NOT exist
Media file (IC3PEAK Ă°Ă»ÑŒĂ±ĂÂľĂÂĽ Ă¡Ă»Ă°Ă´ĂÂşĂ°Ñ.mkv) can not be opened!
Problem 1: I am told the media file does not exist.
Problem 2: When I print the media file name to STDOUT, notice how the displayed file name non longer matches how it looks when I did the 'dir' command?
Can anyone suggest how to fix these two problems?
PS - Noting, when I change the disk file name to pure ASCII 'IC3PEAK.mkv', and change the $media variable to also equal 'IC3PEAK.mkv', running the modified Perl script gives:
Media file does exist