3

I'm experiencing a rather odd problem while using Data::Dumper to try and check on my importing of a large list of data into a hash.

My Data looks like this in another file.

##Product ID => Market for product
ABC => Euro
XYZ => USA
PQR => India

Then in my script, I'm trying to read in my list of data into a hash like so:

open(CONFIG_DAT_H, "<", $config_data);       
while(my $line = <CONFIG_DAT_H>) {
    if($line !~ /^\#/) {
        chomp($line);
        my @words = split(/\s*\=\>\s/, $line);
        %product_names->{$words[0]} = $words[1];
    }
}
close(CONFIG_DAT_H);
print Dumper (%product_names);

My parsing is working for the most part that I can find all of my data in the hash, but when I print it using the Data::Dumper it doesn't print it properly. This is my output.

$VAR1 = 'ABC';
';AR2 = 'Euro
$VAR3 = 'XYZ';
';AR4 = 'USA
$VAR5 = 'PQR';
';AR6 = 'India

Does anybody know why the Dumper is printing the '; characters over the first two letters on my second column of data?

Jon Bot
  • 87
  • 6
  • 1
    Looks like carriage returns (`\r`) in the input. `chomp` won't get rid of them, try something stronger like `$line =~ s/\s+$//` instead – mob Sep 30 '16 at 20:49
  • 1
    You need to pass a reference, `Dumper(\%product_names)`, and not use `%h->{key}` syntax (but `%h{key}`) -- OR -- if `*product_names` is in fact a hashref then you need to use the correct sigil, `$product_names->{key}` and `Dumper($product_names)`. – zdim Sep 30 '16 at 20:56
  • Looks like that regex worked. Just to make sure I understand it, it is trimming any type of whitespace: \n, \r, \t, etc... from the end of the line? I didn't know that chomp() didn't kill other types of whitespace. – Jon Bot Sep 30 '16 at 20:57
  • 1
    Also try setting `$Data::Dumper::Useqq = 1;`. – melpomene Sep 30 '16 at 21:01
  • `chomp` trims whatever [`$/`](https://metacpan.org/pod/perlvar#INPUT_RECORD_SEPARATOR) is set to, typically `\n` on POSIXy systems. – mob Sep 30 '16 at 21:09
  • 1
    @mob, It's `"\n"` on all systems (unless you assign something else to it) – ikegami Sep 30 '16 at 23:05

3 Answers3

1

There is one unclear thing in the code: is *product_names a hash or a hashref?

  • If it is a hash, you should use %product_names{key} syntax, not %product_names->{key}, and need to pass a reference to Data::Dumper, so Dumper(\%product_names).

  • If it is a hashref then it should be labelled with a correct sigil, so $product_names->{key} and Dumper($product_names}.

As noted by mob if your input has anything other than \n it need be cleaned up more explicitly, say with s/\s*$// per comment. See the answer by ikegami.

I'd also like to add, the loop can be simplified by loosing the if branch

open my $config_dat_h, "<", $config_data  or die "Can't open $config_data: $!";

while (my $line = <$config_dat_h>) 
{
    next if $line =~ /^\#/;  # or /^\s*\#/ to account for possible spaces

    # ...
}

I have changed to the lexical filehandle, the recommended practice with many advantages. I have also added a check for open, which should always be in place.

Community
  • 1
  • 1
zdim
  • 64,580
  • 5
  • 52
  • 81
  • Unfortunately, I don't think I know how to answer if it is a hashref or a hash. What is the big difference between the two? The way I've been treating it, is to associate the first column "product ID" with my second column "market". The big difference I saw between using dumper to print the hash versus using its reference, is the format change from print every piece of data to printing them in a proper list. – Jon Bot Sep 30 '16 at 21:09
  • @JonBot Those are different data structures. If this is all your code, then you can decide whether you want it to be a hashref or a hash -- and then use it consistently. A "hashref" is a _reference_ to a hash, it's a scalar that must start with `$`. When you use it you _dereference_ it, so `$h->{key}`,etc. A hash starts with `%`. You access it directly, `$h{key}`. See [perlreftut](http://perldoc.perl.org/perlreftut.html) for starters. As it stands now the code is incorrect. – zdim Sep 30 '16 at 21:11
  • @JonBot Both of these allow you to use the key-value association. One better thing about a hashref is that it is a scalar, a single value, so much better for passing around to functions and such. It's like a pointer in C. But then a bit more syntax is needed to use it. One better thing about a hash is that you use the data structure directly, it's easier conceptually, and on the eyes (and hands). And then there's a bit more of course :) – zdim Sep 30 '16 at 21:23
  • I did some research on the difference between the two, and I found a pretty good article on SO here [link](http://stackoverflow.com/questions/1817394/whats-the-difference-between-a-hash-and-hash-reference-in-perl). I believe I want to make use of a real hash not a hashref. Thanks @zdim. – Jon Bot Sep 30 '16 at 21:23
  • @JonBot Oh, that looks like a good one! (Didn't get to read it right now.) Good if you know which you want! I'd suggest that you do get informed and practiced with references -- one can't really do serious work without them. – zdim Sep 30 '16 at 21:26
  • @JonBot I updated the answer with two more coding recommendations, which slipped by at first. Also -- I looked through the post you link, it is excellent. Note though that `@_` aliases arguments, so if they are changed _directly_ (like `$_[0] = 1` in a sub) the caller's variables _change_. We don't want to do that of course, just a warning. A notable exception is keys in a hash, they cannot be changed from the sub. – zdim Oct 01 '16 at 07:19
1

Humm... this appears wrong to me, even you're using Perl6:

%product_names->{$words[0]} = $words[1];

I don't know Perl6 very well, but in Perl5 the reference should be like bellow considering that %product_names exists and is declared:

$product_names{...} = ... ;

If you could expose the full code, I can help to solve this problem.

0

The file uses CR LF as line endings. This would become evident by adding the following to your code:

local $Data::Dumper::Useqq = 1;

You could convert the file to use unix line endings (seeing as you are on a unix system). This can be achieved using the dos2unix utility.

dos2unix config.dat

Alternatively, replace

chomp($line);

with the more flexible

$line =~ s/\s+\z//;

  • Note: %product_names->{$words[0]} makes no sense. It happens to do what you want in old versions of Perl, but it rightfully throws an error in newer versions. $product_names{$words[0]} is the proper syntax for accessing the value of an element of a hash.
  • Tip: You should be using print Dumper(\%product_names); instead of print Dumper(%product_names);.
  • Tip: You might also find local $Data::Dumper::Sortkeys = 1; useful. Data::Dumper has such bad defaults :(
  • Tip: Using split(/\s*=>\s*/, $line, 2) instead of split(/\s*=>\s*/, $line) would permit the value to contain =>.
  • Tip: You shouldn't use global variable without reason. Use open(my $CONFIG_DAT_H, ...) instead of open(CONFIG_DAT_H, ...), and replace other instances of CONFIG_DAT_H with $CONFIG_DAT_H.
  • Tip: Using next if $line =~ /^#/; would avoid a lot of indenting.
ikegami
  • 367,544
  • 15
  • 269
  • 518