0

I'm trying to organise a load of data into a Hash of Hashes of Arrays. The following works fine when I'm manually declaring values etc:

#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;


my %experiment = (
    'gene1' =>  {
                       'condition1' => ['XLOC_000157', '90', '0.001'],
                       'condition2' => ['XLOC_000347','80', '0.5'],
                       'condition3' => ['XLOC_000100', '50', '0.2']
                   },
    'gene2'   =>  {
                       'condition1' => ['XLOC_025437', '100', '0.018'],
                       'condition2' => ['XLOC_000322', '77', '0.22'],
                       'condition3' => ['XLOC_001000', '43', '0.002']

                   }
);

And then print out key/values:

for my $gene (sort keys %experiment) {
    for my $condition ( sort keys %{$experiment{$gene}} ) {
        print "$gene\t$condition\t";
            for my $values (@{$experiment{$gene}{$condition}} ) {
                print "[$values]\t";
            }
        print "\n";
     }
}

Output:

gene1   condition1  [XLOC_000157]   [90]    [0.001] 
gene1   condition2  [XLOC_000347]   [80]    [0.5]   
gene1   condition3  [XLOC_000100]   [50]    [0.2]   
gene2   condition1  [XLOC_025437]   [100]   [0.018] 
gene2   condition2  [XLOC_000322]   [77]    [0.22]  
gene2   condition3  [XLOC_001000]   [43]    [0.002] 

However, the real data I'm working on is too large to manually declare, so I want to be able to achieve the same result as above, but starting from arrays containing each field, for example:

Example input:

condition1    XLOC_000157    1.04564    0.999592      99.66   gene1
condition1    XLOC_000159    0.890436    0.999592    99.47   gene2
condition2    XLOC_000561    -1.05905    0.999592      91.57   gene1
condition2    XLOC_00076    -0.755473    0.999592      99.04   gene2

Split input into arrays:

my (@gene, @condition, @percent_id, @Xloc, @change, @q_value @split, %experiment);
while (<$list>) {
    chomp;
    @split = split('\t');
    push @condition, $split[0];
    push @Xloc, $split[1];
    push @change, $split[2];
    push @q_value, $split[3];
    push @percent_id, $split[4];
    push @gene, $split[5];
}   

I've been building HoAs to store this in as such:

push @{$results{$gene_name[$_]} }, [ $Xloc[$_], $change, $q_value, $percent_id[$_] ] for 0 .. $#gene_name;

But I'm now trying to integrate 'condition' information for each HoA, and thus build a HoHoA. Ideally I want to do this within the while loop (hence 'dynamically') in a similar fashion as above, to achieve the following data structure:

$VAR1 = {
          'gene1' => {
                       'condition1' => [
                                         'XLOC_000157',
                                         '1.04564',
                                         '0.999592',
                                         '99.66'
                                       ],
                       'condition2' => [
                                         'XLOC_000561',
                                         '-1.05905',
                                         '0.999592'
                                         '91.57'

                                       ],

                     },
          'gene2' => {
                       'condition1' => [
                                         'XLOC_000159',
                                         '0.890436',
                                         '0.999592'
                                         '99.47'

                                       ],
                       'condition2' => [
                                         'XLOC_00076',
                                         '-0.755473',
                                         '0.999592'
                                         '99.04'

                                       ],

                     }
        };
fugu
  • 6,417
  • 5
  • 40
  • 75
  • 2
    ok, you want to build `%experiment` like hash? how does look input data? – mpapec Sep 11 '13 at 12:13
  • 1
    are you reading input data from the file?? – rams0610 Sep 11 '13 at 12:17
  • 1
    I've re-read this question three times and still don't understand what you're after. To what extent are you manually declaring this data structure? Do you want to generate the entire HoHoA dynamically or just avoid typing out the contents of `@array`s? I suggest you take a look at the canonical reference on how to grow your own complex data structure: [`perldoc perldsc`](http://perldoc.perl.org/perldsc.html) – Zaid Sep 11 '13 at 12:28
  • 1
    This question brings back memories of my first ever SO question: http://stackoverflow.com/q/1089530/133939 . It amazes me to this day how much I have learnt since then! – Zaid Sep 11 '13 at 12:36
  • By manual I mean literally typing in key names etc. For the above example every key/value would be stored in an array - I'll amend the question to reflect this... – fugu Sep 11 '13 at 12:46
  • Where does the string `condition1` come from? It doesn't appear in your input. – ikegami Sep 11 '13 at 13:46
  • In one place, the arrays have three fields (`['XLOC_001000', '43', '0.002']`). In another, two (`[ $Xloc[$_], $percent_id[$_] ]`). The actual values aren't consistent throughout your answer either. Please be more careful next time. – ikegami Sep 11 '13 at 14:27

1 Answers1

1
my %experiment;
while (<$list>) {
    chomp;
    my ($condition, $xloc, $percent_id, $gene) = split /\t/;
    $experiment{$gene}{$condition} = [ $xloc, $percent_id ];
}
ikegami
  • 367,544
  • 15
  • 269
  • 518