0

I have a hash of hashes

my %change;

while ( <DATA> ) {
    chomp;
    my ($gene, $condition, $change) = split;
    $change{$gene}{$condition} = $change;
}

print Dumper \%change;

__DATA__
gene1   condition1  10
gene2   condition1  0.5
gene3   condition1  1.5
gene1   condition2  2
gene2   condition2  13.5
gene3   condition2  0.25

And I want to sort it by value:

gene2   condition2  13.5
gene1   condition1  10
gene1   condition2  2
gene3   condition1  1.5
gene2   condition1  0.5
gene3   condition2  0.25

I'm using:

for my $g (keys %change){
    for my $con (keys $change{$g}){
        for my $ch (sort { $change{$g}{$a} <=> $change{$g}{$b} } keys $change{$g}{$con} ) {
            print "$g\t$con\t$ch\n";
        }

    }
}

But this doesn't work, and generates the error

Type of argument to keys on reference must be unblessed hashref or arrayref at untitled.pl line 23, line 6.

Line 23 is

for my $ch (sort { $change{$g}{$a} <=> $change{$g}{$b} } keys $change{$g}{$con}){

Can anyone point me in the right direction?

Borodin
  • 126,100
  • 9
  • 70
  • 144
fugu
  • 6,417
  • 5
  • 40
  • 75
  • That error means that the value contained in -- I assume -- `$change{$g}` is not a hashref or arrayref. Show the data structure that you are using, how you assign to it, or equivalent, runnable code. – TLP Aug 13 '15 at 13:09
  • Yeah, in a later Perl it will warn that keys on a hashref are experimental. But that is not the only problem. If you fix that, it will complain more. – simbabque Aug 13 '15 at 13:10
  • Which line is line 6? – Jens Aug 13 '15 at 13:11
  • I see your problem now. You cannot sort by checking one value at the time, you need a complete list. And also, you are using a three level hash, where you only have two. – TLP Aug 13 '15 at 13:14
  • You won't be able to sort the first dimension of that structure on the 3rd one. They will only be sorted inside of the conditions like this. The genes will be in the random hash order they came out of `keys` in. – simbabque Aug 13 '15 at 13:14
  • @TLP - I've changed the sorting on the three level hash in my question as that's clearly wrong - thanks – fugu Aug 13 '15 at 13:17
  • 3
    @fugu If your only concern is sorting based on the numeric value in each line of a text, it is probably best to prepare for that when collecting the data, not by transforming the hash afterwards. – TLP Aug 13 '15 at 13:18
  • 1
    You might be interested in a pure bash solution: http://stackoverflow.com/q/17430470/725418 – TLP Aug 13 '15 at 14:04

4 Answers4

5

I think it's very unlikely that you need the data in a hash structure like that. Certainly for the purposes of this task you would be better off with an array of arrays

use strict;
use warnings;

my @change;

while ( <DATA> ) {
    push @change, [ split ];
}

print "@$_\n" for sort { $b->[2] <=> $a->[2] } @change;


__DATA__
gene1   condition1  10
gene2   condition1  0.5
gene3   condition1  1.5
gene1   condition2  2
gene2   condition2  13.5
gene3   condition2  0.25

output

gene2 condition2 13.5
gene1 condition1 10
gene1 condition2 2
gene3 condition1 1.5
gene2 condition1 0.5
gene3 condition2 0.25

If you explain what sort of access you need to the data then I am sure there is something better. For instance, I would suggest %gene and %condition hashes that mapped a gene or condition ID to a list of the array elements that used that gene. Then you could access the data when you know either the gene or the condition

Borodin
  • 126,100
  • 9
  • 70
  • 144
4

As I mentioned in the comments, the simplest solution is to not first parse the text input into a hash, then sort the hash, but rather collect the data into a more suitable form and sort it there.

Also, note that you cannot do your sorting while iterating over the values. You need to compile a list, and sort that list all at once, since sort is a sort of iterator itself.

I have shown first my choice of method for the input given, then how to sort the hash.

use strict;
use warnings;

my %change;
my @sort;
while(<DATA>) {
    chomp;
    my ($gene, $condition, $change) = split;
    $change{$gene}{$condition} = $change;
    push @sort, [ $change, $_ ];
}

@sort = sort { $a->[0] <=> $b->[0] } @sort;
say $_->[1] for @sort;

# Using the hash:

my @values;
for my $gene (keys %change) {
    for my $con (keys %{ $change{$gene} }) {
        my $num = $change{$gene}{$con};
        push @values, [ $num, "$gene\t$con\t$num" ];
    }
}
@values = sort { $a->[0] <=> $b->[0] } @values;
say $_->[1] for @values;

__DATA__
gene1   condition1  10
gene2   condition1  0.5
gene3   condition1  1.5
gene1   condition2  2
gene2   condition2  13.5
gene3   condition2  0.25

As you can see, I am using a sort of cache to access the value more easily. For example push @sort, [ $change, $_ ] stores an array ref with the numeric value, along with the original string from the input. These values can then be accessed with $a->[0] when sorting, and $_->[1] when printing.

I find this method to be simple and robust. Though if your input file is very large, it may cause some memory issues due to the duplication of data. But anything smaller than gigabytes should be fine on a modern system.

TLP
  • 66,756
  • 10
  • 92
  • 149
3

You can flatten your hash structure, and then sort numerically by value (last element in array of arrays)

my $VAR1 = {
      'gene1' => {
                   'condition1' => '10',
                   'condition2' => '2'
                 },
      'gene2' => {
                   'condition1' => '0.5',
                   'condition2' => '13.5'
                 },
      'gene3' => {
                   'condition1' => '1.5',
                   'condition2' => '0.25'
                 }
    };

my @sorted = sort {
    $b->[2] <=> $a->[2]
  }
  map {
    my $k = $_;
    my $h = $VAR1->{$k};
    map [ $k, $_, $h->{$_} ], keys %$h;
  }
  keys %$VAR1;

print "@$_\n" for @sorted;

output

gene2 condition2 13.5
gene1 condition1 10
gene1 condition2 2
gene3 condition1 1.5
gene2 condition1 0.5
gene3 condition2 0.25

using foreach instead of map,

my @arr;
for my $k (keys %$VAR1) {
  my $h = $VAR1->{$k};
  for (keys %$h) {
    push @arr, [ $k, $_, $h->{$_} ];
  }
}
my @sorted = sort { $b->[2] <=> $a->[2] } @arr;
mpapec
  • 50,217
  • 8
  • 67
  • 127
  • Thanks for the answer but is there way to do this without using `map` (I try to avoid it if possible) – fugu Aug 13 '15 at 13:18
  • 1
    @fugu You should not be scared of `map`. It is much like `for` and `sort`, just another way to iterate over a list. – TLP Aug 13 '15 at 13:46
  • I'm still wary of `map` because it's a road to hard to understand code. So tend to avoid it when trying to explain things. But it's also pretty cool if used right. – Sobrique Aug 13 '15 at 13:59
  • @Sobrique It's a tool, like any tool. It is simpler to use `my @data = map ... ` than to use a regular loop and `push`. It is also essential when doing Schwartzian transforms. – TLP Aug 13 '15 at 14:11
  • 1
    I don't dispute it's a powerful tool. However I also think it's a great way to create the type of "write only" code that `perl` is famous for - if you aren't careful about how you use it. And thus a tool that's best not dropped into the hands of a beginner until they properly "grok" it - whilst you can make some really concise solutions, I don't feel they 'add value' if the person reading it has a headache trying to understand what's going on. – Sobrique Aug 13 '15 at 14:22
3

You only have two hashes deep, so

  • %change is a hash.
  • $change{$g} is a reference to a hash
  • %{ $change{$g} } is a hash.
  • $change{$g}{$con} is a number.
  • %{ $change{$g}{$con} } is an error, as reported.

The fix is... Well, there is no fix. The approach you took can't be used to solve your problem.


You can't sort a hash. You can sort the keys of a hash, but that's not what you want to here. You'll want to sort key pairs. So first, you're going to have to create those key pairs.

map {
   my $outer_key = $_;
   map {
      my $inner_key = $_;
      [ $outer_key, $inner_key ]
   } keys %{ $change{$_} }
} keys(%change)

This creates

[
   [ 'gene1', 'condition1' ],
   [ 'gene1', 'condition2' ],
   [ 'gene2', 'condition1' ],
   [ 'gene2', 'condition2' ],
   [ 'gene3', 'condition1' ],
   [ 'gene3', 'condition2' ],
]

When we sort them

sort { $change{ $a->[0] }{ $a->[1] } <=> $change{ $b->[0] }{ $b->[1] }

All together:

for (
   sort { $change{ $a->[0] }{ $a->[1] } <=> $change{ $b->[0] }{ $b->[1] }
   map {
      my $gene = $_;
      map {
         my $con = $_;
         [ $gene, $con ]
      } keys %{ $change{$_} }
   } keys(%change)
) {
   my ($gene, $con) = @$_;
   print("$g\t$con\t$change{$gene}{$con}\n");
}

But what if we created the following flattened structure instead?

[
   [ 'gene1', 'condition1', 10    ],
   [ 'gene1', 'condition2',  2    ],
   [ 'gene2', 'condition1',  0.5  ],
   [ 'gene2', 'condition2', 13.5  ],
   [ 'gene3', 'condition1',  1.5  ],
   [ 'gene3', 'condition2',  0.25 ],
]

This would allow us to simplify some.

for (
   sort { $a->[2] <=> $b->[2] }
   map {
      my $gene = $_;
      map {
         my $con = $_;
         [ $gene, $con, $change{$gene}{$con} ]
      } keys %{ $change{$_} }
   } keys(%change)
) {
   my ($gene, $con, $ch) = @$_;
   print("$g\t$con\t$ch\n");
}
ikegami
  • 367,544
  • 15
  • 269
  • 518