0

I work in the medical field and have a networkx MultiDiGraph G representing Human Phenotype Ontology (HPO) numenclature of symptoms.

For example, 'Macrocephaly at birth' with HPO id HP:0004488 is a clinical symptom and is connected upstream to a less specific term 'Abnormality of head or neck' with HPO id HP:0000152:

nx.shortest_path(G, source='HP:0004488', target='HP:0000152')

['HP:0004488',
 'HP:0000256',
 'HP:0040194',
 'HP:0000240',
 'HP:0000929',
 'HP:0000234',
 'HP:0000152']

I have a list of patients with their symptoms coded with numenclature names. I have added patients as nodes in the graph and added edges between patient and all hpo terms found in the patient, for example:

graph.edges('patient1')

OutMultiEdgeDataView([('patient1', 'HP:0000218'), ('patient1', 'HP:0007957'), ('patient1', 'HP:0010444'),
('patient1', 'HP:0000540'), ('patient1', 'HP:0001659'), ('patient1', 'HP:0001382'),
('patient1', 'HP:0001075'), ('patient1', 'HP:0010562')])

I also have a list of diseases represented by OMIM numenclature and their associated symptoms. I have added diseases as nodes to the graph and connected them with their symptoms, for example:

graph.edges('MIM:130000')

OutMultiEdgeDataView([('MIM:130000', 'HP:0001373'), ('MIM:130000', 'HP:0001537'), ('MIM:130000', 
'HP:0000993'), ('MIM:130000', 'HP:0000592'), ('MIM:130000', 'HP:0002010'), ('MIM:130000', 'HP:0002783'), 
('MIM:130000', 'HP:0002616'), ('MIM:130000', 'HP:0000394'), ('MIM:130000', 'HP:0008947'), ('MIM:130000', 
'HP:0004322'), ('MIM:130000', 'HP:0000286'), ('MIM:130000', 'HP:0025014'), ('MIM:130000', 'HP:0001763'), 
('MIM:130000', 'HP:0001083'), ('MIM:130000', 'HP:0005222'), ('MIM:130000', 'HP:0006316'), ('MIM:130000', 
'HP:0000978'), ('MIM:130000', 'HP:0011108'), ('MIM:130000', 'HP:0000977'), ('MIM:130000', 'HP:0001187'), 
('MIM:130000', 'HP:0000767'), ('MIM:130000', 'HP:0000006'), ('MIM:130000', 'HP:0001030'), ('MIM:130000', 
'HP:0005100'), ('MIM:130000', 'HP:0002105'), ('MIM:130000', 'HP:0010500'), ('MIM:130000', 'HP:0000545'), 
('MIM:130000', 'HP:0001073'), ('MIM:130000', 'HP:0001634'), ('MIM:130000', 'HP:0000974'), ('MIM:130000', 
'HP:0000023'), ('MIM:130000', 'HP:0002758'), ('MIM:130000', 'HP:0001058'), ('MIM:130000', 'HP:0010485')])

I also have the information with which disease each patient was diagnosed, which I saved as an attribute of patient nodes.

I would like to use link prediction to find the most likely disease in future patients. To do that I implemented node2vec algoritm on patient nodes:

Gs = StellarGraph.from_networkx(G)
rw = BiasedRandomWalk(Gs)

patients = [x for x in G.nodes() if x.startswith('patient')]

walks = rw.run(
    nodes=patients,  # root nodes
    length=20,  # maximum length of a random walk
    n=20,  # number of random walks per root node
    p=0.5,  # Defines (unormalised) probability, 1/p, of returning to source node
    q=2.0,  # Defines (unormalised) probability, 1/q, for moving away from source node
)
print("Number of random walks: {}".format(len(walks)))

However, when I look at walks, I see the following:

print(walks[0])

['MIM:610163', 'CD247', 'MIM:610163', 'HP:0003496', 'MIM:308240', 'HP:0001744', 'HP:0010974', 'HP:0010974', 'HP:0011895', 'HP:0008897', 'HP:0030353', 'HP:0030353', 'HP:0033579', 'patient1']

From my understanding it would be best if walker traversed only the phenotype nodes and avoid all others (diseases and patient nodes), because if the walker jumps to another disease and then follows the phenotype nodes associated with that disease, the walk (and vector) will contain phenotype terms very far away in the graph from the disease's original phenotype.

Is it possible to change the settings of the Random Walk function or replace it with another to only walk through certain nodes? For example with a specific attribute?

I have gone through the StellarGraph and networkx package documentation and there does not seem to be a good solution for my problem.

Am I missing something? Are my assumptions faulty? Does the method I used not fit the problem well?

desertnaut
  • 57,590
  • 26
  • 140
  • 166

0 Answers0