0

Hi I am having a problem with making my CSV file readable. I am currently trying to do it using PERL. Here's my line of code:

#!/usr/bin/perl

$infile = @ARGV[0];
$outfile = @ARGV[1];

open(INFILE,"$infile") || die "cannot open input file : $infile : ";

open(OUTFILE,">$outfile") || die "cannot open output file";

$/="undef";

while(<INFILE>)

{

  $temp=$_;

}

close(INFILE);

  print OUTFILE "\x{feff}".$temp;

close(OUTFILE);

However, the CSV file is still unreadable. Is there anything that I can do to insert BOM? Thanks!

Lee Duhem
  • 14,695
  • 3
  • 29
  • 47
maihani
  • 1
  • 1

4 Answers4

2

Before we do this, let me tell you that BOMs are an incredible pain in most cases, and should be avoided wherever possible. They are only technically necessary with UTF-16 encodings. The BOM is the Unicode character U+FEFF. It is encoded in UTF-8 as EF BB BF, in UTF-16LE as FF FE, and UTF-16BE as FE FF. It seems you are assuming that your input is UTF-16BE, in that case you could write the bytes directly:

open my $in,  "<:raw", $ARGV[0] or die "Can't open $ARGV[0]: $!";
open my $out, ">:raw", $ARGV[1] or die "Can't open $ARGV[1]: $!";

print $out "\xFE\xFF";
while (<$in>) {
    print $out $_;
}

But it would probably be better to decode and the encode the output again, and explicitly specify the BOM as a character:

open my $in,  "<:encoding(UTF-16BE)", $ARGV[0] or die "Can't open $ARGV[0]: $!";
open my $out, ">:encoding(UTF-16BE)", $ARGV[1] or die "Can't open $ARGV[1]: $!";

print $out "\N{U+FEFF}";
while (<$in>) {
    print $out $_;
}
amon
  • 57,091
  • 2
  • 89
  • 149
1

What you probably want to do, rather than manually inserting a BOM, is set the output file encoding to whatever it is you need.

Also:

  • You are setting the input record separator to the literal string "undef", which is definitely not what you want! (Although it happens to work as long as undef doesn't appear in the input files). Remove the quotes there.
  • use warnings; use strict;
0

I think you need something like this at the top of your code:

use open OUT => ':encoding(UTF-16)';
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
0

You've got a few answers about your BOM. But here's your code written in more idiomatic Perl.

#!/usr/bin/perl

use strict;
use warnings;

my ($infile, $outfile) = @ARGV;

open my $in_fh, $infile or die "cannot open input file : $infile : $!";
open my $out_fh, '>', $outfile or die "cannot open output file: $!";

print $out_fh "\x{feff}";
print $out_fh while <$in_fh>;
Dave Cross
  • 68,119
  • 3
  • 51
  • 97