I have a big pipe-delimited input file approx 6 million lines as below:
24|BBG000SJFVB0|EQ0000000009296012|OI SA-ADR|OIBR/C|US|ADR|Equity 16|BBG002PHVB83|EQ0000000022353186|BLOOM SELECT INCOME FUND|BLB-U|CT|Closed-End Fund|Equity
-50|BBG000V0TN75|EQ0000000010271114|MECHEL-PREF SPON ADR|MTL/P|US|ADR|Equity 20|BBG002S0ZR60|EQ0000000022739316|DIVIDEND 15 SPLIT CORP II-RT|DF-R|CT|Closed-End Fund|Equity
-20|BBG001R3LGM8|EQ0000000017879513|ING FLOATING RATE SENIOR LOA|ISL/U|CT|Closed-End Fund|Equity 0|BBG006M6SXL2|EQ0000000006846232|AA PLC|AA/|LN|Common Stock|Equity
Requirements are as below:
1. I need to sort this input file by 1st column and then 2nd column and then 2nd last column in that order
2. Displaying % of sort completion in terminal/console for e.g. "column 2 75% sort done"
3. finally output in a separate file.
I have written the program below which is sorting by 1st column perfectly. But how to incorporate the all other conditions? Also now it is taking a little more time to run. Is there any more efficient and cleaner way to do it? Only thing is we can't use any additional outside package from CPAN. Unix solutions like using SED/AWK are OK but Perl is preferable.I just came to know built-in Python is also there so that solution is also welcome.
my (%link_strength);
{$data="datascope_input.txt";
$out="sort_file.txt";
open (my $indata , '<', $data)|| die "could not open $data :\n$!";
open (my $outdata , '>', $out)|| die "could not open $out :\n$!";
select $outdata;
my @array=(<$indata>);
for (@array){
$link_strength{$1}=$_ if /(?:[^|]+\|){0}([^|]+)/;
}
print $link_strength{$_} for (sort {$a<=>$b} keys %link_strength);
close ($outdata);
close ($indata);
}