Slowness on iterating over namedtuple and dicts for defined pairs

Question

This code runs quickly on the sample data, but when iterating over a large file, it seems to run slowly, perhaps because of the nested for loops. Is there any other reason why iterating over items in a defaultdict is slow?

import itertools

sample_genes1={"0002":["GENE1", "GENE2", "GENE3", "GENE4"],
               "0003":["GENE1", "GENE2", "GENE3", "GENE6"],
               "0202":["GENE4", "GENE2", "GENE1", "GENE7"]}

def get_common_gene_pairs(genelist):
    genedict={}
    for k,v in sample_genes1.items():
        listofpairs=[]
        for i in itertools.combinations(v,2):
            listofpairs.append(i)
            genedict[k]=listofpairs
    return genedict

from collections import namedtuple,defaultdict
def get_gene_pair_pids(genelist):
    i=defaultdict(list)
    d=get_common_gene_pairs(sample_genes1)
    Pub_genes=namedtuple("pair", ["gene1", "gene2"])
    for p_id,genepairs in d.iteritems():
        for p in genepairs:
            thispair=Pub_genes(p[0], p[1])
            if thispair in i.keys():
                i[thispair].append(p_id)
            else:
                i[thispair]=[p_id,]
    return i

if __name__=="__main__":
    get_gene_pair_pids(sample_genes1)

One of the first things you should do to solve problems like this is profile your code. See [_How can you profile a Python script?_](http://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script) to find out how easy it is. — martineau, Jan 10 '17 at 23:10

Jean-François Fabre · Accepted Answer · 2017-01-10T21:49:56.827

One big problem: this line:

if thispair in i.keys():

doesn't take advantage of dictionary search, it's linear search. Drop the keys() call, let the dictionary do its fast lookup:

if thispair in i:

BUT since i is a default dict which creates a list when key doesn't exist, just replace:

if thispair in i.keys():
    i[thispair].append(p_id)  # i is defaultdict: even if thispair isn't in the dict, it will create a list and append p_id.
else:
    i[thispair]=[p_id,]

by this simple statement:

i[thispair].append(p_id)

(it's even faster because there's only one hashing of p_id)

to sum it up:

don't do thispair in i.keys(): it's slow, in python 2, or 3, defaultdict or not
you have defined a defaultdict, but your code just assumed a dict, which works but slower.

Note: without defaultdict you could have just removed the .keys() or done this with a simple dict:

i.setdefault(thispair,list)
i[thispair].append(p_id)

(here the default item depends on the key)

Aside:

def get_common_gene_pairs(genelist):
    genedict={}
    for k,v in sample_genes1.items():  # should be genelist, not the global variable, you're not using the genelist parameter at all

And you're not using the values of sample_genes1 at all in your code.

Perfectly explained. It would be great if you could mention the use of `setdefault` in order to skip `if` lookups — Moinuddin Quadri, Jan 10 '17 at 21:46
Never mind. I think it is useless since OP know about `defaultdict` *(which I missed)*. Just mentioning the correct use of `defaultdict` suffices here — Moinuddin Quadri, Jan 10 '17 at 21:49
Thanks a lot for answering, really appreciate the time and care :D — user3089348, Jan 11 '17 at 13:02

score 1 · Answer 2 · edited May 23 '17 at 12:01

1

Adding to Jean's answer, your get_common_gene_pairs function could be optimized by using dict comprehension as:

def get_common_gene_pairs(genelist):
    return {k : list(itertools.combinations(v,2)) for k,v in genelist.items()}

list.append() is much more time consuming compared to it's list comprhension counter part. Also, you don't have to iterate over itertools.combinations(v,2) in order to convert it to list. Type-casting it to list does that.

Here is the comparison I made between list comprehension and list.append() in answer to Comparing list comprehensions and explicit loops, in case you are interested in taking a look.

edited May 23 '17 at 12:01

Community

1
1

answered Jan 10 '17 at 21:59

Moinuddin Quadri

46,825
13
96
126

great! I love learning more about the theoretical basis of why I should be doing things a certain way, thank you very much. I am going to work on adding list comprehensions to my repertoire. – user3089348 Jan 11 '17 at 13:04

Slowness on iterating over namedtuple and dicts for defined pairs

2 Answers2