2

I am trying to solve bioinformatics problems from rosalind.info and I am locked out with this problem: http://rosalind.info/problems/mrna/

To solve it you have to calculate the number of different RNA strings from which the protein could have been translated, modulo 1,000,000.

Biological background: A protein is a string composed of 20 amino acids represented with 20 different letters. Each amino acids can be replaced with more than one RNA string (composed by 3 letter each 1).

This problem gets you to the point of how to manage large number when programming, a usual case in bioinformatics. I have tried different things but I always get INF or a negative value so something I am doing something bad.

The problems itself suggest that I should find a way of manipulating large numbers without having to store them. How is this possible? How can I achieve that with PHP?

This is my best until now:

<?php function protein_reverse($sec) {
    $sec_arr = str_split($sec);
    $aa = array(
        'F' => '2',
        'L' => '6',
        'S' => '6',
        'Y' => '2',
        'C' => '2',
        'W' => '1',
        'P' => '4',
        'H' => '2',
        'Q' => '2',
        'R' => '4',
        'I' => '3',
        'M' => '1',
        'T' => '4',
        'N' => '2',
        'K' => '2',
        'V' => '4',
        'A' => '4',
        'D' => '2',
        'E' => '2',
        'G' => '4',
    );
    $r = 1;
    foreach ( $sec_arr as $base ) {
        $r *= $aa[$base] % 1000000;
    }
    return $r;
} ?>
ThemesCreator
  • 1,749
  • 6
  • 27
  • 49
  • http://stackoverflow.com/questions/211345/working-with-large-numbers-in-php might help you – Gavriel Jan 24 '16 at 23:44
  • hint: as a warm up you could try to find the number of ways to make numeric operation precedence explicit with parenthesis within your program – tomc Jan 26 '16 at 08:13
  • @tomc I don't understand your comment, what means to make "numeric operation precedence explicit"? – ThemesCreator Jan 26 '16 at 19:32
  • you had ` $r *= $aa[$base] % 1000000;` which is ` $r *= ($aa[$base] % 1000000);` and since `$aa[$base] ` is always much smaller than `1000000` you effectively have ` $r *= $aa[$base];` writing precedence explicitly would get it right `$r = (r * $aa[$base]) % 1000000;' – tomc Jan 27 '16 at 03:05

1 Answers1

0

I have finally been able to solve the problem. First, like says the question @Gavriel added in the comments, I have had to use the GMP library for these big numbers operations. Second, I was missing to multiply per 3 at the end. This is necessary because if the protein finished, there must be a termination codon (secuence).

/*
    Reverse translation of protein
    @return number of possible RNA strings modulo 1000000
 */
function protein_reverse($sec) {
    $sec_arr = str_split($sec);
    $aa = array(
        'F' => '2',
        'L' => '6',
        'S' => '6',
        'Y' => '2',
        'C' => '2',
        'W' => '1',
        'P' => '4',
        'H' => '2',
        'Q' => '2',
        'R' => '6',
        'I' => '3',
        'M' => '1',
        'T' => '4',
        'N' => '2',
        'K' => '2',
        'V' => '4',
        'A' => '4',
        'D' => '2',
        'E' => '2',
        'G' => '4',
    );
    $r = 1;
    foreach ( $sec_arr as $base ) {
        $r = gmp_mul($r, $aa[$base]);
    }
    $r = gmp_mul($r, 3);
    $r = gmp_mod($r, 1000000);
    return $r;
}
ThemesCreator
  • 1,749
  • 6
  • 27
  • 49