0

With reference to the question Calculating the distance between atomic coordinates, where the input is

ATOM    920  CA  GLN A 203      39.292 -13.354  17.416  1.00 55.76           C 
ATOM    929  CA  HIS A 204      38.546 -15.963  14.792  1.00 29.53           C
ATOM    939  CA  ASN A 205      39.443 -17.018  11.206  1.00 54.49           C  
ATOM    947  CA  GLU A 206      41.454 -13.901  10.155  1.00 26.32           C
ATOM    956  CA  VAL A 207      43.664 -14.041  13.279  1.00 40.65           C 
.
.
.

ATOM    963  CA  GLU A 208      45.403 -17.443  13.188  1.00 40.25           C  

there is an answer reported as

use strict;
use warnings;

my @line;
while (<>) {
    push @line, $_;            # add line to buffer
    next if @line < 2;         # skip unless buffer is full
    print proc(@line), "\n";   # process and print 
    shift @line;               # remove used line 
}

sub proc {
    my @a = split ' ', shift;   # line 1
    my @b = split ' ', shift;   # line 2
    my $x = ($a[6]-$b[6]);      # calculate the diffs
    my $y = ($a[7]-$b[7]);
    my $z = ($a[8]-$b[8]);
    my $dist = sprintf "%.1f",                # format the number
                   sqrt($x**2+$y**2+$z**2);   # do the calculation
    return "$a[3]-$b[3]\t$dist"; # return the string for printing
}

The output of above code is the distance between the first CA to the second one and second to third and so on...

How to modify this code to find the distance between first CA to rest of the CAs (2, 3, ..) and from second CA to rest of the CAs (3, 4, ..) and so on and printing only those which is less then 5 Angstrom? I found that push @line, $_; statement should be altered to increase the array size but not clear how to do that.

Community
  • 1
  • 1
pradeep pant
  • 103
  • 4
  • what is your expected output? – ssr1012 Dec 15 '16 at 14:16
  • GLN-HIS "distance value" GLN-ASN "distance value" GLN-GLU "distance value" ... HIS-ASN "distance value" HIS-GLU "distance value" HIS-VAL "distance value" ... ASN-GLU "distance value" ASN-VAL "distance value" ... so on... @ssr1012 – pradeep pant Dec 16 '16 at 06:45

2 Answers2

1

To get the pairs, read the file into an array, @data_array. Then loop over the entries.

Update: Added file opening and load @data_array.

open my $fh, '<', 'atom_file.pdb' or die $!;

my @data_array = <$fh>;

close $fh or die $!;

for my $i (0 .. $#data_array) {
    for my $j ($i+1 .. $#data_array) {
        process(@data_array[$i,$j]);    
    }   
}
Chris Charley
  • 6,403
  • 2
  • 24
  • 26
  • Should I insert these lines in place of "push @line, $_; " keeping other parts of the code as it is? Please post the full code after adding these lines. – pradeep pant Dec 16 '16 at 06:39
  • 1
    @pradeep pant I added the code preceding the 2 for loops. sub `process` is your sub `proc` (You'll need to edit your `proc` sub to either print the distances or return a suitable string to print) – Chris Charley Dec 16 '16 at 17:28
1

May be try this:

use strict;
use warnings;

my @alllines = ();
while(<DATA>) {  push(@alllines, $_);  }

#Each Current line
for(my $i=0; $i<=$#alllines+1; $i++)
{
    #Each Next line 
    for(my $j=$i+1; $j<=$#alllines; $j++)
    {
        if($alllines[$i])
        {
            #Split the line into tab delimits
            my ($line1_tb_1,$line1_tb_2,$line1_tb_3) = split /\t/, $alllines[$i];
            print "Main_Line: $line1_tb_1\t$line1_tb_2\t$line1_tb_3";
            if($alllines[$j])
            {
                #Split the line into tab delimits
                my ($line_nxt_tb1,$line_nxt_tb2,$line_nxt_tb3) = split /\t/, $alllines[$j];

                print "Next_Line: $line_nxt_tb1\t$line_nxt_tb2\t$line_nxt_tb3";

                #Do it your coding/regex here
            }
        }
        #system 'pause'; Testing Purpose!!!
    }
}

__DATA__
tab1    123 456
tab2    789 012
tab3    345 678
tab4    901 234
tab5    567 890

I hope this will help you.

ssr1012
  • 2,573
  • 1
  • 18
  • 30