5

The answer provided in wide char and win32::api works for passing utf-16 to the Win API. But how do I convert utf16 strings returned by the Win API? (I am trying to use GetCommandLineW).

I have tried both Unicode::String and Encode::decode without success. I'm guessing that perhaps the data needs to be packed or unpacked first, but how?

After that, the next problem is how to deal with a pointer-to-pointer-to-utf16 like the one returned by CommandLineToArgvW.

Thanks for any help.

Zaid
  • 36,680
  • 16
  • 86
  • 155
  • 1
    Please post code of what you tried, what you expected, and what you got instead. – andlabs Jun 11 '17 at 21:55
  • @andlabs, I understand the question, and I know there's really not much more the OP could provide. I'm in the middle of writing an answer. – ikegami Jun 11 '17 at 21:56
  • *"I have tried both `Unicode::String` and `Encode::decode`"* Please include the code to show exactly what you tried and describe the problems you had. It will help us to write more accurate answers, and your question's primary value is to the many other people who may be looking for a solution to a similar problem. "Without success" isn't much of a problem statement and it will be impossible to tell whether your situation is a match with just that. – Borodin Jun 12 '17 at 10:34

1 Answers1

6

When you specify the return value is a string, Win32::API assumes it's a terminated by a byte with value 0, but bytes with that value are common in UTF-16le text.

As Win32::API suggests, you should use the N type (or Q on 64-bit builds) to get the pointer as a number, then read the pointed memory yourself. Win32::API's provides ReadMemory to read memory, but it requires knowing how much memory to read. That's not useful for NUL-terminated strings and wide NUL-terminated strings.

For wide NUL-terminated strings, Win32::API provides SafeReadWideCString. But SafeReadWideCString can return a string unrelated to the input on error, so I use my own decode_LPCWSTR instead.

use strict;
use warnings;
use feature qw( say state );

use open ':std', ':encoding('.do { require Win32; "cp".Win32::GetConsoleOutputCP() }.')';

use Config     qw( %Config );
use Encode     qw( decode encode );
use Win32::API qw( ReadMemory );

use constant PTR_SIZE => $Config{ptrsize};

use constant PTR_PACK_FORMAT =>
     PTR_SIZE == 8 ? 'Q'
   : PTR_SIZE == 4 ? 'L'
   : die("Unrecognized ptrsize\n");

use constant PTR_WIN32API_TYPE =>
     PTR_SIZE == 8 ? 'Q'
   : PTR_SIZE == 4 ? 'N'
   : die("Unrecognized ptrsize\n");

    
sub lstrlenW {
   my ($ptr) = @_;

   state $lstrlenW = Win32::API->new('kernel32', 'lstrlenW', PTR_WIN32API_TYPE, 'i')
      or die($^E);

   return $lstrlenW->Call($ptr);
}


sub decode_LPCWSTR {
   my ($ptr) = @_;
   return undef if !$ptr;

   my $num_chars = lstrlenW($ptr)
      or return '';

   return decode('UTF-16le', ReadMemory($ptr, $num_chars * 2));
}


# Returns true on success. Returns false and sets $^E on error.
sub LocalFree {
   my ($ptr) = @_;

   state $LocalFree = Win32::API->new('kernel32', 'LocalFree', PTR_WIN32API_TYPE, PTR_WIN32API_TYPE)
      or die($^E);

   return $LocalFree->Call($ptr) == 0;
}


sub GetCommandLine {
   state $GetCommandLine = Win32::API->new('kernel32', 'GetCommandLineW', '', PTR_WIN32API_TYPE)
      or die($^E);

   return decode_LPCWSTR($GetCommandLine->Call());
}


# Returns a reference to an array on success. Returns undef and sets $^E on error.
sub CommandLineToArgv {
   my ($cmd_line) = @_;

   state $CommandLineToArgv = Win32::API->new('shell32', 'CommandLineToArgvW', 'PP', PTR_WIN32API_TYPE)
      or die($^E);

   my $cmd_line_encoded = encode('UTF-16le', $cmd_line."\0");
   my $num_args_buf = pack('i', 0);  # Allocate space for an "int".

   my $arg_ptrs_ptr = $CommandLineToArgv->Call($cmd_line_encoded, $num_args_buf)
      or return undef;

   my $num_args = unpack('i', $num_args_buf);
   my @args =
      map { decode_LPCWSTR($_) }
         unpack PTR_PACK_FORMAT.'*',
            ReadMemory($arg_ptrs_ptr, PTR_SIZE * $num_args);

   LocalFree($arg_ptrs_ptr);
   return \@args;
}


{
   my $cmd_line = GetCommandLine();

   say $cmd_line;

   my $args = CommandLineToArgv($cmd_line)
      or die("CommandLineToArgv: $^E\n");

   for my $arg (@$args) {
      say "<$arg>";
   }
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Fixed so it's correct for both 32-bit and 64-bit builds of Perl. – ikegami Jun 12 '17 at 06:22
  • Thank you so much for such a clear and useful implementation. It shows nicely the concepts needed to effectively use Win32::API. I wrote a replacement for decode_LPCWSTR() that is probably efficient enough for most purposes: `code` sub decode_LPCWSTR { state $lstrlenW = Win32::API->new('kernel32', 'lstrlenW', PTR_WIN32API_TYPE, 'N') or die($^E); my ($ptr) = @_; return undef if !$ptr; my $nchars = $lstrlenW->Call($ptr); return '' if $nchars == 0; my $sW = ReadMemory($ptr, $nchars * 2); return decode('UTF-16le', $sW); } – Freon Sandoz Jun 14 '17 at 17:35
  • Indeed. I shall replace the one in my answer with that! – ikegami Jun 14 '17 at 17:39
  • 1
    (Can someone please format my response and then delete this? I couldn't get "`code`" to work. Thanks.) – Freon Sandoz Jun 14 '17 at 17:42
  • 1) We can't edit comments, 2) Can't format code in comments. 3) It's been incorporated into my answer, so its readability in the comments is moot. – ikegami Jun 14 '17 at 17:44