I have 2 big files (tab delimited).
first file ->
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 101_#2 1 2 F0 263 278 2 1.5 102_#1 1 6 F1 766 781 1 1.0 103_#1 2 15 V1 526 581 1 0.0 103_#1 2 9 V2 124 134 1 1.3 104_#1 1 12 V3 137 172 1 1.0 105_#1 1 17 F2 766 771 1 1.0
second file ->
Col1 Col2 Col3 Col4 97486 9 262 279 67486 9 118 119 87486 9 183 185 248233 9 124 134
I want to compare col5 and col6 of file 1(like a range value) with col3 and col4 of file2. If the range of file 1 is present in file 2 then return that row (from file1).
Expected output ->
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 101_#2 1 2 F0 263 278 2 1.5 103_#1 2 9 V2 124 134 1 1.3
So far I have tried ->
@ARGV or die "No input file specified";
open my $first, '<',$ARGV[0] or die "Unable to open input file: $!";
open my $second,'<', $ARGV[1] or die "Unable to open input file: $!";
print scalar (<$first>);
while (<$first>) {
@cols = split /\s+/;
$p1 = $cols[4];
$p2 = $cols[5];
while(<$second>) {
@sec=split /\s+/;
print join("\t",@cols),"\n" if ($p1>=$sec[2] && $p2<=$sec[3]);
}
}
But this is working only for first row. Also the files are very big (around 6gb).
I just tried something with hashes.
@ARGV or die "No input file specified";
open my $first, '<',$ARGV[0] or die "Unable to open input file: $!";
open my $second,'<', $ARGV[1] or die "Unable to open input file: $!";
print scalar (<$first>);
while(<$second>){
chomp;
@line=split /\s+/;
$hash{$line[2]}=$line[3];
}
while (<$first>) {
@cols = split /\s+/;
$p1 = $cols[4];
$p2 = $cols[5];
foreach $key (sort keys %hash){
if ($p1>= "$key"){
if ($p2<=$hash{$key})
{
print join("\t",@cols),"\n";
}
}
else{next;}
}
}
But this is also taking a lot of time and memory.Can anybody suggest how I can make it fast using hashes.Thanks a lot.