5

I'm working on a script to batch rename and copy images based on a csv file. The csv consists of column 1: old name and column 2: new name. I want to use the csv file as input for the perl script so that it checks the old name and makes a copy using the new name into a new folder. The problem that (i think) I'm having has to do with the images. They contain utf8 characters like ß etc. When I run the script it prints out this: Barfu├ƒg├ñsschen where it should be Barfußgässchen and the following error:

Unsuccessful stat on filename containing newline at C:/Perl64/lib/File/Copy.pm line 148, <$INFILE> line 1.
Copy failed: No such file or directory at X:\Script directory\correction.pl line 26, <$INFILE> line 1.

I know it has to do with Binmode utf8 but even when i try a simple script (saw it here: How can I output UTF-8 from Perl?):

use strict;
use utf8;
my $str = 'Çirçös';
binmode(STDOUT, ":utf8");
print "$str\n";

it prints out this: Ãirþ÷s

This is my entire script, can someone explain to me where i'm going wrong? (its not the cleanest of codes because i was testing out stuff).

use strict;
use warnings;
use File::Copy;
use utf8;

my $inputfile  = shift || die "give input!\n";
#my $outputfile = shift || die "Give output!\n";

open my $INFILE,  '<', $inputfile   or die "In use / not found :$!\n";
#open my $OUTFILE, '>', $outputfile  or die "In use / not found :$!\n";

binmode($INFILE, ":encoding(utf8)");

#binmode($OUTFILE, ":encoding(utf8)");

while (<$INFILE>) {
s/"//g;
my @elements = split /;/, $_;

my $old = $elements[1];
my $new = "new/$elements[3]";
binmode STDOUT, ':utf8';
print "$old | $new\n";

copy("$old","$new") or die "Copy failed: $!";
#copy("Copy.pm",\*STDOUT);

#   my $output_line = join(";", @elements);
#    print $OUTFILE $output_line;
#print "\n"
}

close $INFILE;
#close $OUTFILE;

exit 0;
Community
  • 1
  • 1
Jan
  • 248
  • 4
  • 12
  • About your first snippet: Is the .pl file itself encoded in utf8? The `use utf8` pragma tells Perl that your sourcecode is written in utf8. It doesn't concern the data. – simbabque Nov 23 '12 at 12:15
  • Where are you printing the output to? A Linux shell? Also, how are you creating the file? – Alastair McCormack Nov 23 '12 at 12:31
  • I can confirm that your first snippet works fine on my Linux shell with the LANG set to `en_GB.UTF-8` and Putty set to UTF-8. I created the file using VIM in the same shell. – Alastair McCormack Nov 23 '12 at 12:48
  • @Fuzzyfelt I'm on a windows system if that's what u mean. I created the csv file manually. Did a dir command on the directory and opened it in excel. I alse created the new names in the same file. – Jan Nov 23 '12 at 12:50

1 Answers1

3

You need to ensure every step of the process is using UTF-8.

When you create the input CSV, you need to make sure that it's saved as UTF-8, preferably without a BOM. Windows Notepad will add a BOM so try Notepad++ instead which gives you more control of the encoding.

You also have the problem that the Windows console is not UTF-8 compliant by default. See Unicode characters in Windows command line - how?. Either set the codepage with chcp 65001 or don't change the STDOUT encoding.

In terms of your code, the first error regarding the new line is probably due to the trailing new line from the CSV. Add chomp() after while (<$INFILE>) {

Update:

To "address" the file you need to encode your filenames in the correct locale - See How do you create unicode file names in Windows using Perl and What is the universal way to use file I/O API with unicode filenames?. Assuming you're using Western 1252 / Latin, this means when your copy command will look like:

copy(encode("cp1252", $old), encode("cp1252", $new))

Also, your open should also encode the filename:

open my $INFILE,  '<', encode("cp1252", $inputfile)

Update 2:

As you're running in a DOS window, remove binmode(STDOUT, ":utf8"); and leave the default codepage in place.

Community
  • 1
  • 1
Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
  • I created the CSV in notepad++ as utf8 without a BOM so that shouldn't be it. I'm checking the other 2 suggestions u gave me right now. – Jan Nov 23 '12 at 13:39
  • Adding `chomp;` after `while (<$INFILE>) {` did the trick for the first error. Setting the codepage first and then trying the script once more does not change anything. I still get the message copy failed. I print the old and new name and you can see that the old name is incorrect and thats why it does not match the actual file and fails – Jan Nov 23 '12 at 14:02