1

I have generated a list of genes

genes = ['geneName1', 'geneName2', ...] 

and a set of their interactions:

geneInt = {('geneName1', 'geneName2'), ('geneName1', 'geneName3'),...} 

I want to find out how many interactions each gene has and put that in a vector (or dictionary) but I struggle to count them. I tried the usual approach:

interactionList = []
for gene in genes:
   interactions = geneInt.count(gene)
   interactionList.append(ineractions)

but of course the code fails because my set contains elements that are made out of two values while I need to iterate over the single values within.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • if i understood your problem right, you might want to look at [this answer](https://stackoverflow.com/a/4746942/3522688). this is an implementation of the answer https://replit.com/@Shorotshishir/Just-Test#main.py – Siratim Dec 03 '21 at 16:12

3 Answers3

2

I would argue that you are using the wrong data structure to hold interactions. You can represent interactions as a dictionary keyed by gene name, whose values are a set of all the genes it interacts with.

Let's say you currently have a process that does something like this at some point:

geneInt = set()
...
    geneInt.add((gene1, gene2))

Change it to

geneInt = collections.defaultdict(set)
...
    geneInt[gene1].add(gene2)

If the interactions are symmetrical, add a line

    geneInt[gene2].add(gene1)

Now, to count the number of interactions, you can do something like

intCounts = {gene: len(ints) for gene, ints in geneInt.items()}

Counting your original list is simple if the interactions are one-way as well:

intCounts = dict.fromkeys(genes, 0)
for gene, _ in geneInt:
    intCounts[gene] += 1

If each interaction is two-way, there are three possibilities:

  1. Both interactions are represented in the set: the above loop will work.

  2. Only one interaction of a pair is represented: change the loop to

    for gene1, gene2 in geneInt:
        intCounts[gene1] += 1
        if gene1 != gene2:
            intCounts[gene2] += 1
    
  3. Some reverse interactions are represented, some are not. In this case, transform geneInt into a dictionary of sets as shown in the beginning.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
1

Try something like this,

interactions = {}

for gene in genes:
    interactions_count = 0
    for tup in geneInt:
        interactions_count += tup.count(gene)
    interactions[gene] = interactions_count
Himanshu Kawale
  • 389
  • 2
  • 11
  • This solution is unnecessarily time-complex. You don't need to iterate over `gene in genes` because you _already know_ what gene you're looking at from the `tup in geneInt` loop. Then, `tup.count(gene)` makes it worse because you iterate over the entire tuple just to count how many of `gene` it has. – Pranav Hosangadi Dec 03 '21 at 16:22
0

Use a dictionary, and keep incrementing the value for every gene you see in each tuple in the set geneInt.

interactions_counter = dict()

for interaction in geneInt:
    for gene in interaction:
        interactions_counter[gene]  = interactions_counter.get(gene, 0) + 1

The dict.get(key, default) method returns the value at the given key, or the specified default if the key doesn't exist. (More info)

For the set geneInt={('geneName1', 'geneName2'), ('geneName1', 'geneName3')}, we get:

interactions_counter = {'geneName1': 2, 'geneName2': 1, 'geneName3': 1}
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
  • 1
    You can avoid the `get` by initializing the dictionary ahead of time using `dict.fromkeys` since you know all possible keys up front. – Mad Physicist Dec 03 '21 at 16:45
  • @MadPhysicist I did think of that, but is the performance of `.get()` so much worse to warrant this? Using `.get()` saves one pass over the list of keys, which might be useful for a large list (not a biologist, so not sure what is a reasonable size) – Pranav Hosangadi Dec 03 '21 at 18:49
  • I'm not a biologist either, but if speed was a real concern, they'd be using C++ or similar, not python. – Mad Physicist Dec 03 '21 at 18:50