0

I'm helping my girlfriend with an assignment. Part of this assignment is the calculation of the number of possible combinations in a DNA sequence that contains certain wildcard characters.

I'm using the following python script:

from Bio import Seq
from Bio import SeqIO

short = SeqIO.parse('short.fasta', 'fasta')
long = SeqIO.parse('long.fasta', 'fasta')

# IUPAC dictionary

IUPAC = {
    'R': 2,
    'Y': 2,
    'S': 2,
    'W': 2,
    'K': 2,
    'M': 2,
    'B': 3,
    'D': 3,
    'H': 3,
    'V': 3,
    'N': 4
}

# Define method to count number of possible sequences
def pos_seq(seqs):
    d = {}

    for record in seqs:
        pos = 1    
        name = record.id
        seq = record.seq

        for ltr in seq:
            if ltr in IUPAC.keys():
                pos = pos * IUPAC[ltr]

        d.update({name : pos})
        print(name + ": " + str(pos) + " possibilities")
        print("")

    print("end of file")
    print("")

    return d



print(pos_seq(short))
print(pos_seq(long))

The function pos_seq takes in a collection of sequences and returns the number of possibilities for each of sequence.

The script works fine and the function prints the correct answer on each iteration. However, I wanted to save the sequence's name and number of possibilities into a dictionary and return it.

Here is the problem: It always returns an empty dictionary (as defined at the start of the method).

Defining the dictionary globally (outside the function) works; the dictionary DOES get updated properly, so the problem may be that I defined the dictionary within the function. Maybe I need to change the .update line to specify the dictionary I want to update isn't a global dictionary?

Long story to ask a simple question: I can't seem to use a function to create a dictionary, update it several times and then return it. It stays empty.

I'm not very experienced in Python, but I couldn't find an answer to this question online.

Thanks for the help!

  • 3
    I don't know if this is the direct cause of your problem, but it's certainly worth correcting: some of your lines are indented with spaces, and some are indented with tabs. In particular, `d = {}`, `d.update({name : pos})`, and `return d` are all tabbed. Mixing indentation styles can confuse the language parser and cause strange behavior. I recommend re-indenting each of those lines using only spaces. – Kevin Dec 27 '16 at 14:07
  • 1
    If `short` or `long` is empty (whatever they are) the `for` loop in `pos_seq` won't be executed and `d` will remain an empty dict. – DeepSpace Dec 27 '16 at 14:19
  • Totally unrelated but you want to replace `if ltr in IUPAC.keys()` with `if ltr in IUPAC` - that's _much_ faster. Also, you don't need to call `d.update()` at all, you can just use the subscript syntax `d[name] = pos`. Not that any of this will fix your problem actually, but well... – bruno desthuilliers Dec 27 '16 at 14:20
  • Oh and yes: there's of course no problem with creating a dict and populating it within a function and returning it. As Kevin said, fix your code to use only spaces for indentation (and fix your editor's configuration), it might as well fix the problem. – bruno desthuilliers Dec 27 '16 at 14:23
  • Could you provide sample input and output data to test with? Also, do not that there is a difference between a method and function. A method is a function associated to an object. See http://stackoverflow.com/questions/155609/difference-between-a-method-and-a-function – Moon Cheesez Dec 27 '16 at 15:41
  • @Kevin Thanks! This fixed my problem. I didn't realise the parser could get confused by this. The lines with spaces were auto-indented and I manually used a tab for the lines I added afterwards. – Simon Van Rompaey Dec 27 '16 at 15:41
  • @Kevin good catch—how did this fail to throw an `IndentationError` ?? – jez Dec 27 '16 at 15:54
  • You should change the settings in your editor to insert spaces when you press the Tab key. – Code-Apprentice Dec 27 '16 at 16:41

1 Answers1

1

After a bit of research of the puzzling question that jez raised (no indentation error raised despite an indentation error), I found that the reason why your code works.

You were indenting with 8 spaces and tabs inconsistently. This is allowed in Python 2 but not allowed in Python 3. http://python3porting.com/differences.html#indentation

In Python 2 a tab will be equal to eight spaces as indentation, so you can indent one line with a tab, and the next line with eight spaces.

In Python 3 a tab is only equal to another tab. This means that each indentation level has to be consistent in its use of tabs and spaces.

Hence, if you try this code:

print("Hello world!")
def my_func(x):
        print(x)
    print(x)

It will work in Python 2 but will raise an indentation error in Python 3.

Another thing to note about StackOverflow that I also found out is that when you paste code with indents in them, it will show up as 4 spaces when displayed but when you go into edit mode, you will be able to copy and paste the tabs (try it).

And that is why, your code does not work even when there are still errors.

Community
  • 1
  • 1
Moon Cheesez
  • 2,489
  • 3
  • 24
  • 38