This question is quite similar to this one How can I get the average and standard deviations grouped by key? but I don't manage to modify it to fit my problem.
I have a lot of files (.csv) with 7 columns, the last three columns look like this:
col5,col6,col7
1408,1,123
1408,2,234
1408,3,345
1408,4,456
1408,5,567
1408,6,678
1409,0,123
1409,1,234
1409,2,345
1409,3,456
1409,4,567
1409,5,678
1409,6,789
...
N,0,123
N,1,234
N,2,345
N,3,456
N,4,567
N,5,678
N,6,789
What I want to do is to calculate the average of the last column (col7) for all the values that have the same value in column 5 (col5), so 1408, 1409, 1410, ... until N and I don't know N. I want to print this average value next to the line (in col8) which contains a 3 in column 6 (col6). Do note that the value in column 6 (col6) goes from 0 to 6, but the first number of the file is not always 0. So what I want is:
col1,col2,col3,col4,col5,col6,col7,col8
bla,bla,bla,bla,1408,3,345,400.5
bla,bla,bla,bla,1409,3,456,456
...
bla,bla,bla,bla,N,3,456,456
I have some script I can use to calculate the average, but I have to be able to put my values into an array for that. Below is what I tried to do, but it doesn't work. Also, I'm just trying to learn Perl on my own, so if it looks like crap, I'm just trying!
open (FILE, "<", $dir.$file) or die;
my @lines = <FILE>;
foreach my $line(@lines) {
my ($col1,$col2,$col3,$col4,$col5,$col6,$col7) = split(/\,/, $line);
push @arrays5, $col5;
}
foreach my $array5(@arrays5) {
foreach my $line(@lines) {
my ($col1,$col2,$col3,$col4,$col5,$col6,$col7) = split(/\,/, $line);
if ($array5 == $col5) {
push @arrays7, $col7;
}
}
}
close(FILE);