0

Am new to Neo4j and trying to do a POC by implementing a graph DB for Enterprise Reference / Integration Architecture (Architecture showing all enterprise applications as Nodes, Underlying Tables / APIs - logically grouped as Nodes, integrations between Apps as Relationships.

Objective is to achieve seamlessly 'Impact Analysis' using the strength of Graph DB (Note: I understand this may be an incorrect approach to achieve whatever am trying to achieve, so suggestions are welcome)

Let me come brief my question now,

There are four Apps - A1, A2, A3, A4; A1 has set of Tables (represented by a node A1TS1) that's updated by Integration 1 (relationship in this case) and the same set of tables are read by Integration 2. So the Data model looks like below

(A1TS1)<-[:INT1]-(A1)<-[:INT1]-(A2)
(A1TS1)-[:INT2]->(A1)-[:INT2]->(A4)

I have the underlying application table names captured as a List property in A1TS1 node.

Let's say one of the app table is altered for a new column or Data type and I wanted to understand all impacted Integrations and Applications. Now am trying to write a query as below to retrieve all nodes & relationships that are associated/impacted because of this table alteration but am not able to achieve this

Expected Result is - all impacted nodes (A1TS1, A1, A2, A4) and relationships (INT1, INT2)

Option 1 (Using APOC)

MATCH (a {TCName:'A1TS1',AppName:'A1'})-[r]-(b) 
WITH a as STRTND, Collect(type(r)) as allr 
CALL apoc.path.subgraphAll(STRTND, {relationshipFilter:allr}) YIELD nodes, relationships 
RETURN nodes, relationships

This faile with error Failed to invoke procedure 'apoc.path.subgraphAll': Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String

Option 2 (Using with, unwind, collect clause)

MATCH (a {TCName:'A1TS1',AppName:'A1'})-[r]-(b) 
WITH a as STRTND, Collect(r) as allr 
UNWIND allr as rels 
MATCH p=()-[rels]-()-[rels]-() 
RETURN p

This fails with error "Cannot use the same relationship variable 'rels' for multiple patterns" but if I use the [rels] once like p=()-[rels]=() it works but not yielding me all nodes

Any help/suggestion/lead is appreciated. Thanks in advance

Update Trying to give more context Showing the Underlying Data

MATCH (TC:TBLCON) RETURN TC

"TC"
{"Tables":["TBL1","TBL2","TBL3"],"TCName":"A1TS1","AppName":"A1"}  
{"Tables":["TBL4","TBL1"],"TCName":"A2TS1","AppName":"A2"} 

MATCH (A:App) RETURN A

"A"                                                                   
{"Sponsor":"XY","Platform":"Oracle","TechOwnr":"VV","Version":"12","Tags":["ERP","OracleEBS","FinanceSystem"],"AppName":"A1"}               
{"Sponsor":"CC","Platform":"Teradata","TechOwnr":"RZ","Tags":["EDW","DataWarehouse"],"AppName":"A2"}   

MATCH ()-[r]-() RETURN distinct r.relname

"r.relname"
"FINREP"   │  (runs between A1 to other apps)
"UPFRNT"   │  (runs between A2 to different Salesforce App)
"INVOICE"  │  (runs between A1 to other apps)

With this, here is what am trying to achieve Assume "TBL3" is getting altered in App A1, I wanted to write a query specifying the table "TBL3" in match pattern, get all associated relationships and connected nodes (upstream)

May be I need to achieve in 3 steps, Step 1 - Write a match pattern to find the start node and associated relationship(s) Step 2 - Store that relationship(s) from step 1 in a Array variable / parameter Step 3 - Pass the start node from step 1 & parameter from step 2 to apoc.path.subgraphAll to see all the impacted nodes

This may conceptually sound valid but how to do that technically in neo4j Cypher query is the question.

Hope this helps

cybersam
  • 63,203
  • 6
  • 53
  • 76
  • Why are younot using a query like that : `MATCH p=(a {TCName:'A1TS1',AppName:'A1'})-[:INT*]-(b) RETURN p` – logisima Jan 31 '18 at 07:29
  • For more info on variable-length pattern matching, see [this section of the developers guide](http://neo4j.com/docs/developer-manual/current/cypher/syntax/patterns/#cypher-pattern-varlength). – InverseFalcon Jan 31 '18 at 07:35
  • You're also not using node labels or relationship types in your queries, which is a recipe for inefficiency. You may want to reread the developers documentation and go through some of the built in cypher tutorials via the browser. – InverseFalcon Jan 31 '18 at 09:05
  • @logisima: I can't (or) dont want to do that, because the Integration name (relationship name) can be different. I have used INT1 & INT2 for example but in reality it might be like [:Invoice] [:FinReport] etc. I would like to retrieve all impacted integrations (relationships) and applications (nodes) because of this one underlying table change. Hope this clarifies. – Mani Eswaran Jan 31 '18 at 16:49
  • OK, and does this one fits your needs : `MATCH p=(a {TCName:'A1TS1',AppName:'A1'})-[:INT|FinReport|Invoice*]-(b) RETURN p` – logisima Jan 31 '18 at 17:02
  • @logisima: Thanks again & appreciate your time, but this may not work because the relationship names are confined to these two now - [:Invoice] [:FinReport] but relationship names is expected to differ. – Mani Eswaran Jan 31 '18 at 17:35
  • @logisima: Thanks again & appreciate your time, but this may not work because the relationship names are confined to these two now - [:Invoice] [:FinReport] but relationship names are expected to differ based on the impacted object. (to make the question more clear, I am planning to add more context to the original question above give me some time, I will try to make it more precise) – Mani Eswaran Jan 31 '18 at 17:41
  • Your question update says you want "upstream" nodes and relationships, but your original expected result indicates you also want "downstream" nodes and relationships as well. Please clarify. – cybersam Jan 31 '18 at 21:57
  • @cybersam: Yes, finding both upstream / downstream would also help – Mani Eswaran Feb 01 '18 at 02:54

2 Answers2

0

This query may do what you want:

MATCH (tc:TBLCON)
WHERE $table IN tc.Tables
MATCH p=(tc)-[:Foo*]-()
WITH tc,
  REDUCE(s = [], x IN COLLECT(NODES(p)) | s + x) AS ns,
  REDUCE(t = [], y IN COLLECT(RELATIONSHIPS(p)) | t + y) AS rs
UNWIND ns AS n
WITH tc, rs, COLLECT(DISTINCT n) AS nodes
UNWIND rs AS rel
RETURN tc, nodes, COLLECT(DISTINCT rel) AS rels;

It assumes that you provide the name of the table of interest (e.g., "TBL3") as the value of a table parameter. It also assumes that the relationships of interest all have the Foo type.

It first finds tc, the TBLCON node(s) containing that table name. It then uses a variable-length non-directional search for all paths (with non-repeating relationships) that include tc. It then uses COLLECT twice: to aggregate the list of nodes in each path, and to aggregate the list of relationships in each path. Each aggregation result would be a list of lists, so it uses REDUCE on each outer list to merge the inner lists. It then uses UNWIND and COLLECT(DISTINCT x) on each list to produce a list with unique elements.

[UPDATE]

If you differentiate between your relationships by type (rather than by property value), your Cypher code can be a lot simpler by taking advantage of APOC functions. The following query assumes that the desired relationship types are passed via a types parameter:

MATCH (tc:TBLCON)
WHERE $table IN tc.Tables
CALL apoc.path.subgraphAll(
  tc, {relationshipFilter: apoc.text.join($types, '|')}) YIELD nodes, relationships
RETURN nodes, relationships;
cybersam
  • 63,203
  • 6
  • 53
  • 76
  • First of all, thanks for taking time. This solution definitely gave some lead for me to approach the solution a little differently. But this will not address all my requirements (if I can say). As said, I have taken some lead from here and also used some APOC procedures to achieve something close enough (will post my query in short). Ultimately all understood is, paramererizing the relationship type and/or passing a list of relationship types to MATCH pattern is not possible at this moment (may change in near future) – Mani Eswaran Feb 01 '18 at 18:30
  • The Update section of your question does not require multiple relationship types. `r.relname` is a relationship property, not a type. There might be an approach, using APOC, to parameterize relationship types, but I did not provide such an approach because your question did not need it. Do you need to add an additional update to your question? – cybersam Feb 01 '18 at 20:45
  • I agree, r.relname is not it's type but a property & I had to take that route as I found no ways to parameterize the relationship types. If you could help me achieve the parameterization for the relationship types, I will be more than happy, because that's the solution I was looking for. May be it wasn't conveyed clearly, sorry about that – Mani Eswaran Feb 01 '18 at 20:52
  • While this achieves what I need, not for all data scenarios (atleast for the two TBLCON nodes that's specified in the question). Because if I give the impacted table as 'TBL1' instead of 'TBL3', both TC nodes are getting selected but results are not yielding all nodes & relationships (at least in the graph display of neo desktop). So I have changed my query a bit and it is working perfectly (you can refer to my update). Thanks much for your help – Mani Eswaran Feb 02 '18 at 03:52
0

WIth some lead from cybersam's response, the below query gets me what I want. Only constraint is, this result is limited to 3 layers (3rd layer through Optional Match)

MATCH (TC:TBLCON) WHERE 'TBL3' IN TC.Tables 
CALL apoc.path.subgraphAll(TC, {maxLevel:1}) YIELD nodes AS invN, relationships AS invR
WITH TC, REDUCE (tmpL=[], tmpr IN invR | tmpL+type(tmpr)) AS impR
MATCH FLP=(TC)-[]-()-[FLR]-(SL) WHERE type(FLR) IN impR 
WITH FLP, TC, SL,impR 
OPTIONAL MATCH SLP=(SL)-[SLR]-() WHERE type(SLR) IN impR RETURN FLP,SLP

This works for my needs, hope this might also help someone.

Thanks everyone for the responses and suggestions

****Update****

Enhanced the query to get rid of Optional Match criteria and other given limitations

MATCH (initTC:TBLCON) WHERE $TL IN initTC.Tables 
WITH Reduce(O="",OO in Reduce (I=[], II in collect(apoc.node.relationship.types(initTC)) | I+II) | O+OO+"|") as RF
MATCH (TC:TBLCON) WHERE $TL IN TC.Tables 
CALL apoc.path.subgraphAll(TC,{relationshipFilter:RF}) YIELD nodes, relationships
RETURN nodes, relationships

Thanks all (especially cybersam)