Determining the ratio of matches to non-matches of 2 primary strands?

Question

Possible Duplicate:
How to plot a gene graph for a DNA sequence say ATGCCGCTGCGC?

Im trying to write a Perl script that compares two DNA sequences (60 characters in length each lets say) in alignment, and then show the ratio of matches to non-matches of the sequences to each other. But i'm not having much luck. if it helps i can upload my code, but its no use. here's an example of what im trying to achieve below.

e.g

A T C G T A C
| | | | | | |
T A C G A A C

So the matches of the above example would be 4. and non-matches are: 3. Giving it a ratio of 4.3.

Any help would be much appreciated. thanks.

score 0 · Answer 1 · answered Aug 14 '12 at 02:02

0

in general, please do post your code. It does help. In any case, something like this should do what you are asking:

#!/usr/bin/perl -w
use strict;
my $d1='ATCGTAC';
my $d2='TACGAAC';

my @dna1=split(//,$d1);
my @dna2=split(//,$d2);

my $matches=0;
for (my $i=0; $i<=$#dna1; $i++) {
    $matches++ if $dna1[$i] eq $dna2[$i];
}
my $mis=scalar(@dna1)-$matches;
print "Matches/Mismatches: $matches/$mis\n";

Bear in mind though that the ratio of 4 to 3 is most certainly not 4.3 but ~1.3. If you post some information on your input file format I will update my answer to include lines for parsing the sequence from your file.

answered Aug 14 '12 at 02:02

terdon

3,260
5
33
57

is it not, oops. thanks so much man. i've been on this for a while. :) . Can I also ask, how would I calculate the ratio of the results???. – Conor C Aug 14 '12 at 02:10
@Conor-c Well, depends what you mean by ratio. Generally x/y = ratio of x to y. Do you mean a percentage? – terdon Aug 14 '12 at 02:34

DavidO · Answer 2 · 2012-08-14T02:17:42.853

Just grab the length of one of the strings (we're assuming string lengths are equal, right?), and then iterate using substr.

my @strings = ( 'ATCGTAC', 'TACGAAC' );

my $matched;
foreach my $ix ( 0 .. length( $strings[0] ) - 1 ) {
  $matched++
    if   substr( $strings[0], $ix, 1 ) eq substr( $strings[1], $ix, 1 );
}

print "Matches: $matched\n";
print "Mismatches: ", length( $strings[0] ) - $matched, "\n";

score 0 · Answer 3 · answered Aug 14 '12 at 02:09

Normally I'd say "What have you tried" and "upload your code first" because it doesn't seem to be a very difficult problem. But let's give this a shot:

create two arrays, one to hold each sequence:

@sequenceOne = ("A", "T", "C", "G", "T", "A", "C");
@sequenceTwo = ("T", "A", "C", "G", "A", "A", "C");
$myMatch = 0;
$myMissMatch = 0;

for ($i = 0; $i < @sequenceOne; $i++) {
    my $output = "Comparing " . $sequenceOne[$i] . " <=> " . $sequenceTwo[$i];
    if ($sequenceOne[$i] eq $sequenceTwo[$i]) {
        $output .= " MATCH\n";
        $myMatch++;
    } else {
        $myMissMatch++;
        $output .= "\n";
    }
    print $output;
}

print "You have " . $myMatch . " matches.\n";
print "You have " . $myMissMatch . " mismatches\n";
print "The ratio of hits to misses is " . $myMatch . ":" . $myMissMatch . ".\n";

Of course, you'd probably want to read the sequence from something else on the fly instead of hard-coding the array. But you get the idea. With the above code your output will be:

torgis-MacBook-Pro:platform-tools torgis$ ./dna.pl 
Comparing A <=> T
Comparing T <=> A
Comparing C <=> C MATCH
Comparing G <=> G MATCH
Comparing T <=> A
Comparing A <=> A MATCH
Comparing C <=> C MATCH
You have 4 matches.
You have 3 mismatches
The ratio of hits to misses is 4:3.

score 0 · Answer 4 · answered Aug 14 '12 at 02:10

So many ways to do this. Here's one.

use strict;
use warnings;

my $seq1 = "ATCGTAC";
my $seq2 = "TACGAAC";

my $len = length $seq1;
my $matches = 0;

for my $i (0..$len-1) {
    $matches++ if substr($seq1, $i, 1) eq substr($seq2, $i, 1);
}

printf "Length: %d  Matches: %d  Ratio: %5.3f\n", $len, $matches, $matches/$len;

exit 0;

score 0 · Answer 5 · answered Aug 14 '12 at 03:11

I think substr is the way to go, rather than splitting the strings into arrays.

This is probably most convenient if presented as a subroutine:

use strict;
use warnings;

print ratio(qw/ ATCGTAC TACGAAC /);

sub ratio {

  my ($aa, $bb) = @_;
  my $total = length $aa;
  my $matches = 0;
  for (0 .. $total-1) {
    $matches++ if substr($aa, $_, 1) eq substr($bb, $_, 1);
  }

  $matches / ($total - $matches);
}

output

1.33333333333333

Kenosis · Answer 6 · 2012-08-14T11:47:20.197

0

Bill Ruppert's right that there are many way to do this. Here's another:

use Modern::Perl;

say compDNAseq( 'ATCGTAC', 'TACGAAC' );

sub compDNAseq {
    my $total = my $i = 0;
    $total += substr( $_[1], $i++, 1 ) eq $1 while $_[0] =~ /(.)/g;
    sprintf '%.2f', $total / ( $i - $total );
}

Output:

1.33

edited Aug 14 '12 at 11:47

answered Aug 14 '12 at 05:51

Kenosis

6,196
1
16
16

Chris Charley · Answer 7 · 2012-08-14T16:50:30.363

0

Here is an approach which gives a NULL, \0, for each match in an xor comparison.

#!/usr/bin/perl
use strict;
use warnings;

my $d1='ATCGTAC'; 
my $d2='TACGAAC'; 

my $len = length $d1; # assumes $d1 and $d2 are the same length

my $matches = () = ($d1 ^ $d2) =~ /\0/g;

printf "ratio of %f", $matches / ($len - $matches);

Output: ratio of 1.333333

edited Aug 14 '12 at 16:50

answered Aug 14 '12 at 13:49

Chris Charley

6,403
2
24
26

Determining the ratio of matches to non-matches of 2 primary strands?

7 Answers7