2

I have some perl code which deletes folders using function File::Path::rmtree. This function works successfully if the folder structure contains ascii character files/folders but fails if the folder contains Unicode character files/folders.. Perl version I am using is "This is perl 5, version 12, subversion 4 (v5.12.4) built for MSWin32-x86-multi-thread"

I have also tried using the latest perl version., but the issue persists. Here is sample code:

use strict 'vars';
require File::Path;

sub Rmdir($)
{
   my ($Arena) = "D:\\tmp\\TestUnicodeRm";

   if (-d $Arena){
   print "Dir to Rmtree $Arena\n";
       File::Path::rmtree($Arena,0,0);
}

     if (-d $Arena){
        print "Failed to clean up test area $Arena.\n";
     }
}

Rmdir $0;

1;

If the directory 'D:\tmp\TestUnicodeRm' has file with name say 'chinese_trad_我的文件.txt' then I get error as "cannot remove directory for XXX: Directory not empty at D:\tmp\rmtree.pm line XX".

Thanks in advance!

2 Answers2

2

Filenames are always bytes. Unfortunately there is no indication or requirement for unicode characters in filenames to be represented in a certain encoding, and every OS has different conventions. In most Unix-like systems the filenames are encoded to UTF-8 and interacted with as bytes. However in Windows the filenames are stored as UTF-16, but interacted with as decoded characters. It sounds like a bug in File::Path that it doesn't properly deal with these filenames as it finds them - as you are not providing the filenames, it can't be a bug in your code.

I would first suggest making sure your File::Path is the latest version (2.16). If this doesn't work, all I can suggest is to report a bug, and either manually recursively use opendir and readdir to remove files and subdirectories, or shell out to rd /s.

my $rc = system 'rd', '/s', $dir; # check for errors as in system() docs
Grinnz
  • 9,093
  • 11
  • 18
  • Re "*Filenames are always bytes*", Not in Windows. They are strings of 16-bit values there. /// Re "*there is no indication or requirement for unicode characters in filenames to be represented in a certain encoding*", In Windows, file names are encoded using UTF-16le, and must therefore be encodable using that encoding. /// Re "*but interacted with as decoded characters*", Quite the opposite. Perl uses calls that expect/return the file name encoded using the system's ACP. /// Re "*It sounds like a bug in File::Path*, It's a known bug/limit of Perl funcs that's merely inherited by File::Path. – ikegami Apr 11 '19 at 02:34
  • @Grinnz Thanks for your comment. I tried with File::Path version 2.15 and it was still failing. We are using 'rd'/'rmdir' as the alternative solution, but we wanted to avoid it and instead use some lib/module implementation. – user3406792 Apr 12 '19 at 10:43
2

You can use the subs provided by Win32::Unicode::File and Win32::Unicode::Dir to do what you want.


Windows provides two versions of each API call that accepts or returns text.

  • The versions with the "A" (ANSI) suffix expect and return text encoded using the system's Active Code Page. ("cp".Win32::GetACP() provides an encoding name you can use with the subs provided by Encode.)

    For example, the DeleteFileA system call is used to delete a file, and it expects a path encoded using the system's Active Code Page.

  • The versions with the "W" (Wide) suffix expect and return text encoded using UTF-16le.

    For example, the DeleteFileW system call is used to delete a file, and it expects a path encoded using UTF-16le.

Perl uses the "A" version of all system calls. The "W" version is required here.

The modules mentioned above provide access to the "W" version of calls you need.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Accepted as answer, as this looks to be most accurate for my requirement. I however failed to build the module on my machine, so could not actually test it. I will try it out again once I have some free time. – user3406792 Apr 13 '19 at 16:27