0

I am trying to achieve what is shown here: I have 2 CSV Files, diease_mstr and Test_mstr Now in Test_mstr, I have many test to disease ID records, which means none of them are unique. The disease ID points to disease_mstr file. In disease_mstr file I have only 2 fields, ID and Disease_name (disease name is unique).

Now, I am creating 3 nodes with labels 1) Tests (only "testname" property) which will have unique tests (total 345 unique testnames)

**Properties :**
a) testname

2) Linknode (pulled entire Test_mstr file) also pulled "disease_name" for corresponding disease_ID from Disease_mstr File

**Properties**
a)tname
b)dname
c)did

3) Disease (pulled form disease_mstr) file.

**Properties**
a)did
b)diseasename

Afterwhich I run create relationships

1)MATCH (t:Tests),(n:Linknode) where t.testname = n.tname CREATE (n)-[r:TEST_2]->(t) RETURN n,r,t

2)MATCH (d:Disease), (l:Linknode) where d.did = l.did MERGE (d)-[r:FOR_DISEASE]->(l) RETURN d,r,l

To get the desired result as shown in image, I run following cypher command :

MATCH (d:Disease)-[r2:FOR_DISEASE]->(l:Linknode)-[r:TEST_2]->(t:Tests) RETURN l,r,t,r2 LIMIT 25

Can someone please help me create 2 more relationships which is marked and linked in image with BLUE and GREEN lines?.

Sample files and images can be accessed in my google folder link

cybersam
  • 63,203
  • 6
  • 53
  • 76
DaVinci007
  • 83
  • 12
  • First, Limit 25 means only return 25 results. Depending on your Neo4j browser version/configuration, those edges may exist, but the browser just isn't rendering them. You can double click to auto expand a node in the browser (up to a limit, but you will see a warning if you hit it). Assuming your issue isn't that simple, why can't you a couple quick cyphers to match those node pairs and create the edge? Do you have a way of identifying the edges you need to create? – Tezra Jun 07 '18 at 18:42

1 Answers1

0

Is your goal to link all diseases to tests so that for any disease you can find out which tests are relevant and for each test, which diseases it tests for?

If so, you are nearly there.

You don't need the link nodes other than to help you during linking the tests to the diseases. In your current scenario you're treating the link nodes as you would if you were creating a relational database. They won't add any value in your graph db. You can create a single relationship between diseases and tests which will do all the work.

Here's a step by step way to load your database. (It probably isn't the most efficient, but it's easy to follow and it works.)

Normalise and load your tests:

load csv with headers from "file:///test_mstr_csv.csv" as line
merge (:Test {testname:line.test_name});

Load your diseases (these looked normalised to me)

load csv with headers from "file:///disease_mstr_csv.csv" as line
create (:Disease {did:line.did, diseasename:line.disease_name});

Load your link nodes:

load csv with headers from "file:///test_mstr_csv.csv" as line
merge (:Link {testname:line.test_name, parentdiseaseid:line.parent_disease_ID});

Now you can create a direct relationship between the diseases and tests with the following query:

match(d:Disease), (l:Link) where d.did = l.parentdiseaseid
with d, l.testname as name
match(t:Test {testname:name}) create (d)<-[:TEST_FOR]-(t);

This last query will find all the link nodes for each disease and extract the test name. It then looks up the test and joins it directly to its corresponding disease.

The link nodes are redundent now, so you can delete them if you wish.

To create the 'blue lines', which I assume are meant to show where tests have diseases in common, run the query below:

match (d:Disease)<-[]-(:Test)-[]->(e:Disease) where id(d) > id(e) 
merge (d)-[:BLUE_LINE]->(e);

The match clause finds all disease pairs with a common test, the where clause ensures a link is created in only one direction and the merge clause ensures only one link is created.

Marj
  • 437
  • 2
  • 9
  • Thanks a lot , it works like a charm, but one more question, What query should i fire to show nodes which have 2 or more level connections? – DaVinci007 Jun 16 '18 at 10:17
  • Please see I posted a question if you can help it wud be great ! https://stackoverflow.com/questions/51001233/neo4j-variable-depth-not-working – DaVinci007 Jun 23 '18 at 12:40
  • Hi, Sorry for the delay - on holiday. I'm not quite clear on your question. If you want to know which diseases have, say, more than 2 tests you can use: `// Diseases with more than 2 tests match (:Test)-[r:TEST_FOR]-(b:Disease) with b, count(r) as count_r where count_r > 2 match (b)<-[y:TEST_FOR]-(x) return b,y, x` I'm still trying to work out why this: `// A query which doesn't work for diseases with more than 2 tests match (a:Test)-[r:TEST_FOR]-(b:Disease) with a, b, r, count(r) as count_r where count_r > 2 return a, b, r` doesn't work! – Marj Jun 25 '18 at 09:03
  • Someone has posted an answer which is working, link : https://stackoverflow.com/questions/51001233/neo4j-variable-depth-not-working – DaVinci007 Jun 26 '18 at 06:00
  • but does not answer my 2nd question in that post see if you can get it ! – DaVinci007 Jun 26 '18 at 06:01