0

EDIT: Now looking on how to calculate # of "looping paths" per node

As the title says, I'm trying to make a function that calculates the number of "signal paths" for any node in a network. A signal path for a node is a path from one of multiple inputs to one of multiple outputs that the node is a part of. I'm using an algorithm someone already made called all_simple_paths, which is a generator which returns every path from an input to an output.

However, even though my code looks right, I'm getting incorrect results. Here's the function:

def signal_path_counter(G, inputs, outputs, node):
    c = 0
    paths = []
    for out in outputs:
        for i in inputs:
            for path in all_simple_paths(G, i, out):
                paths.append(path)
    for path in paths:
        for n in path:
            if(node == n):
                c += 1
    return c

Here's the input data:

import networkx as nx
import matplotlib.pyplot as plt
G=nx.DiGraph()
molecules = ["CD40L", "CD40", "NF-kB", "XBP1", "Pax5", "Bach2", "Irf4", "IL-4", "IL-4R", "STAT6", "AID", "Blimp1", "Bcl6", "ERK", "BCR", "STAT3", "Ag", "STAT5", "IL-21R", "IL-21", "IL-2", "IL-2R"]
Bcl6 = [("Bcl6", "Bcl6"), ("Bcl6", "Blimp1"), ("Bcl6", "Irf4")]
STAT5 = [("STAT5", "Bcl6")]
IL_2R = [("IL-2R", "STAT5")]
IL_2 = [("IL-22", "IL-2R")]
BCR = [("BCR", "ERK")]
Ag = [("Ag", "BCR")]
CD40L = [("CD40L", "CD40")]
CD40 = [("CD40", "NF-B")]
NF_B = [("NF-B", "Irf4"), ("NF-B", "AID")]
Irf4 = [("Irf4", "Bcl6"), ("Irf4", "Pax5"), ("Irf4", "Irf4"), ("Irf4", "Blimp1")]
ERK = [("ERK", "Bcl6"), ("ERK", "Blimp1"), ("ERK", "Pax5")]
STAT3 = [("STAT3", "Blimp1")]
IL_21 = [("IL-21", "IL-21R")]
IL_21R = [("IL-21R", "STAT3")]
IL_4R = [("IL-4R", "STAT6")]
STAT6 = [("STAT6", "AID"), ("STAT6", "Bcl6")]
Bach2 = [("Bach2", "Blimp1")]
IL_4 = [("IL-4", "IL-4R")]
Blimp1 = [("Blimp1", "Bcl6"), ("Blimp1", "Bach2"), ("Blimp1", "Pax5"), ("Blimp1", "AID"), ("Blimp1", "Irf4")]
Pax5 = [("Pax5", "Pax5"), ("Pax5", "AID"), ("Pax5", "Bcl6"), ("Pax5", "Bach2"), ("Pax5", "XBP1"), ("Pax5", "ERK"), ("Pax5", "Blimp1")]
edges = Bcl6 + STAT5 + IL_2R + IL_2 + BCR + Ag + CD40L + CD40 + NF_B + Irf4 + 
ERK + STAT3 + IL_21 + IL_21R + IL_4R + STAT6 + Bach2 + IL_4 + Blimp1 + Pax5
G.add_nodes_from(molecules)
G.add_edges_from(edges)
sources = ["Ag", "CD40L", "IL-2", "IL-21", "IL-4"]
targets = ["XBP1", "AID"]

Visual representation of the input network here.

The function call that gives incorrect result of 0:

print(signal_path_counter(G, sources, targets, "IL-2R"))
witcheR
  • 89
  • 4
  • 13
  • So no, if you look at the visual representation of the graph, the sources and targets are just input and output nodes, which we use to calculate an X number of paths to get from any input to any output. IL-2R is a node that's on the way, and I want to get a counter of how many times IL-2R appears in these input to output paths. This count is what I'm returning in my function. – witcheR Aug 03 '17 at 16:39

1 Answers1

1

Your typo is in this line:

IL_2 = [("IL-22", "IL-2R")]

It should be

IL_2 = [("IL-2", "IL-2R")]

There are some things that can be done with your code to make it more "pythonic". Iterating over multiple combinations can be done more cleanly using this approach, which would replace the loop over out and over i with

for input, output in itertools.product(inputs, outputs):
    for path in all_simple_paths(G, input, output):
        paths.append(...)

Also rather than building up paths and then looping through paths to test if the node is in it, do the test directly rather than appending to paths:

for input, output in itertools.product(inputs, outputs):
    for path in all_simple_paths(G, input, output):
        if node in path:
            c += 1

Even for this code, I think it could be made cleaner using a Counter. Basically, if you're ever doing variable += 1, or appending elements to a list while iterating, there's often a "more pythonic" way to do it.

I am concerned about how well this algorithm will scale for larger networks. Finding all paths is expensive. It may be better to start from node and build all paths from node to outputs and all paths from inputs to node. Then convert each path into a set [the conversion into sets makes the next step faster]. Then go through the in and out paths and see if they have any intersection. If not, then you've got a path through node.

This would significantly reduce the number of paths you end up having to consider (and likely the length of the paths as well).

Joel
  • 22,598
  • 6
  • 69
  • 93
  • Thank you for the idea on how to optimize this. Right now, I'm writing a paper that will only be using 2 networks of this size, so luckily for me efficiency isn't a huge issue. – witcheR Aug 03 '17 at 18:42
  • Followup question, another value I need to calculate for every node is "feedback loops", which means how many times the node appears in a looping-path. Is there a built in algorithm for this? – witcheR Aug 03 '17 at 18:42
  • I would need to know a bit more what exactly a "looping path" is. I can think of a couple possible definitions... – Joel Aug 03 '17 at 20:20
  • It's a path that can be repeated, aka a path that results in a loop. Like node A -> node B -> node Z -> node A, if these paths existed, this would be considered a loop. Input/output isn't considers here obviously because there's nothing leading to input and nothing going out from output. – witcheR Aug 03 '17 at 20:41
  • can the looping path have more than one cycle within it, or can it only repeat node A? – Joel Aug 04 '17 at 01:18
  • The relevant paper to the program I'm writing: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-515 – witcheR Aug 04 '17 at 01:31
  • I believe it can have more than one "cycle" in it. I'm not to sure myself. – witcheR Aug 04 '17 at 01:32
  • If a feedback loop does not allow cycles, then just look for a simple path from a node to one of its inputs. If `1->2->3->4->5->3->1` counts as a feedback loop, then it's hard to figure out what to do. Because `1->2->3->1` would also be a loop, as would `1->2->3->4->5->3->4->5->3->4->5->3->4->5->3->1`. So I imagine they probably want to exclude loops within loops. But the paper doesn't seem to make it clear. – Joel Aug 04 '17 at 12:07
  • Going to look into it then post a separate question later if I can't figure it out, thanks for your help. – witcheR Aug 04 '17 at 15:47