0

I am working on a script to assess the similarity between OTU sequences using the function pairwiseAlignment from the package BioStrings.

I established a matrix with NA values, where the following loop is supposed to write over the matrix.

    mat <- nucleotideSubstitutionMatrix(match = 2, mismatch = -3, baseOnly = TRUE)
    x = 1
    y = x + 1
    while(x < length(sequence)){
      #Converts sequences in vector to DNAString
      seq1 <- DNAString(sequence[x])  
      seq2 <- DNAString(sequence[y])
      #Performs pairwise alignment on DNAStrings
      palign1 <- pairwiseAlignment(seq1, seq2, substitutionMatrix = mat, gapOpening = 5, gapExtension = 2)  
      #Converts pairwise alignment score to percentage to store in matrix
      perc_mat[x,(y-1)] <- pid(palign1, type = "PID1")   

      if(y == length(sequence)){
        x <- x + 1
        y <- x + 1 
        print(x)
      }

      else {
        y <- y + 1 
        print(y)
      }


    }

So an OTU is an operational taxanomic unit we associate with DNA sequences of varying lengths. So for an example we have two OTUs: OTU_1100972 v OTU_182893. OTU_1100972 has the following sequence:

GACGAACGCTGGCGGCGTGCCTAATACATGCAAGTTGAGCGCTGAAGGTTGGTACTTGTACCAACTGGATGAGCAGCGAACGGGTGAGTAACGCGTGGGGAATCTGCCTTTGAGCGGGGGACAACATTTGGAAACGAATGCTAATACCGCATAAAAACTTTAAACACAAGTTTTAAGTTTGAAAGATGCAATTGCATCACTCAAAGATGATCCCGCGTTGTATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGATGATACATAGCCGACCTGAGAGGGTGATCGGCCACATTGGGACTGAGACACGGCCCAAACTCCTACGGGAGGCAGCAGTAGGGAATCTTCGGCAATGGACGAAAGTCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGGTAGAGAAGAACGTTGGTGAGAGTGGAAAGCTCATCAAGTGACGGTAACTACCCAGAAAGGGACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTCCCGAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGTGGTTTATTAAGTCTGGTGTAAAAGGCAGTGGCTCAACCATTGTATGCATTGGAAACTGGTAGACTTGAGTGCAGGAGAGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCGGTGGCGAAAGCGGCTCTCTGGCCTGTAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAGATGTAGGGAGCTATAAGTTCTCTGTATCGCAGCTAACGCAATAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCCTCTGACCGCTCTAGAGATAGAGTTTTCCTTCGGGACAGAGGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCCTATTGTTAGTTGCCATCATTCAGTTGGGCACTCTAGCGAGACTGCCGGTAATAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGCTGGTACAACGAGTCGCAAGCCGGTGACGGCAAGCTAATCTCTTAAAGCCAGTCTCAGTTCGGATTGTAGGCTGCAACTCGCCTACATGAAGTCGGAATCGCTAGTAATCGCGGATCAGCACGCCGCGGTGAATACGTTCCCGGGCCT

OTU_182893 has the following sequence: AGAGTTTGATCCTGGCTCAGGATAAACGCTGGCGGCGCACATAAGACATGCAAGTCGAACGGACTTAACTCATTCTTTTAGATTGAGAGCGGTTAGTGGCGGACTGGTGAGTAACACGTAAGCAACCTGCCTATCAGAGGGGAATAACAGTGAGAAATCATTGCTAATACCGCATAAGCTAGTAGTATCGCATGATATAGCTAGAAAAGGAGCAATCCGCTGATAGATGGGCTTGCGTCTGATTAGCTAGTTGGTGGGGTAACGGCATACCAAGGCGACGATCAGTAGCCGGACTGAGAGGTTGAACGGCCACATTGGAACTGAGACACGGTCCAAACTCCTACGGGAGGCAGCAGTGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTCGCGTGAGGGAAGACGGTCCTATGGATTGTAAACCTCTTTTGTCAGGGAGCAAAAACTGCCACGTGTGGCAGGATGAGAGTACCTGAAGAAAAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGACAGTTAAGTCAGCGGTAAAATTGAGAGGCTCAACCTCTTCCCGCCGTTGAAACTGATTGTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCAACTGACGCTGAAGCACGAAAGCGTGGGTATCGAACAGGATTAGATACCCTGGTAGTCCACGCAGTAAACGATGAATACTAATTGTTCGGGTCAAATGAGATCTGAGTGATACAGCGAAAGCGTTAAGTATTCCACCTGGGGAGTACGCCGGCAACGGTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGAGGAACATGTGGTTTAATTTGATGATACGCGAGGAACCTTACCCGGGCTCAAACGTAGCAGGACGTGCGCCGAAAAGCGCATTTCACTTGTGACCTGTTACGAGGTGCTGCATGGTTGTCGTCAGCTCGTGCCGTGAGGTGTCGGCTTAAGTGCCATAACGAGCGCAACCCCTGCCGCCAGTTGCTAACGCATAAAGGCGAGGACTCTGGCGGGACTGCCGGCGCAAGCCGTGAGGAAGGCGGGGATGACGTCAAATCAGCACGGCCCTTACGTCCGGGGCGACACACGTGTTACAATGGGGGGCACAGCGGGAAGCCACTTGGTGACAAGGAGCAAAACCCCCAAGCCTCTCTCAGTTCGGATTGGAGTCTGCAACCCGACTCCATGAAGCTGGATTCGCTAGTAATCGCGCATCAGCCATGGCGCGGTGAATACGTTCCCGGGCCTTGCACACACCGCCCGTCA

NCBI has a tool that conducts Needleman-Wunsch global alignment. Using their tool, I concluded that the the above sequences recieved a score of 769, where they are 71% similar.

Can someone help me navigate this error? Thank you!

knd
  • 41
  • 4
  • 1
    What is `sequence`? (i.e. can you include it in your example for testing purposes) – Eric Jul 07 '17 at 17:08
  • 2
    It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Jul 07 '17 at 17:09
  • I edited the question to provide an example! Thank you! – knd Jul 07 '17 at 17:15
  • Using your example sequences, and initializing `perc_mat <- matrix()`, I am not able to reproduce the error. `perc_mat` is `73.4`. – Eric Jul 07 '17 at 17:21
  • Do you think the error could come from writing over the NA matrix? – knd Jul 07 '17 at 17:33
  • 1
    Check to make sure that you have no non-nucleotide characters in your sequences. I can reproduce this error by adding something other than c("A","C","T","G") to the sequence vector – emilliman5 Jul 07 '17 at 17:56

0 Answers0