2

I fetch first n neighbors of a node with this query in neo4j: (in this example, n = 6)

I have a weighted graph, and so I also order the results by weight:

START start_node=node(1859988)
MATCH start_node-[rel]-(neighbor)
RETURN DISTINCT neighbor,
rel.weight AS weight ORDER BY proximity DESC LIMIT 6;

I would like to fetch a whole subgraph, including second neighbors (first neighbors of first six children).

I tried smtg like :

START start_node=node(1859988)
MATCH start_node-[rel]-(neighbor)
FOREACH (neighbor | MATCH neighbor-[rel2]-(neighbor2) )
RETURN DISTINCT neighbor1, neighbor2, rel.proximity AS proximity ORDER BY proximity DESC LIMIT 6, rel2.proximity AS proximity ORDER BY proximity DESC LIMIT 6;

the syntax is still wrong but I am also uncertain about the output: I would like to have a table of tuples parent, children and weight: [node_A - node_B - weight]

I would like to see if it is performing better one query or six queries. Can someone help in clarifying how to iterate a query (FOREACH) and format the output?

thank you!

user305883
  • 1,635
  • 2
  • 24
  • 48
  • 1
    Are you familiar with the [variable length path](http://neo4j.com/docs/stable/introduction-pattern.html#_variable_length) Cypher pattern? `(n)-[*1..3]->(m)` – justinpawela Jul 01 '15 at 16:56
  • yes it counts the hops from the first (0 to include the root) to the last node. So in my pattern would be [*0..1] for each children connected to the parent. – user305883 Jul 02 '15 at 13:26

2 Answers2

1

First you should avoid using START as it will (hopefully) eventually go away.

So to get a neighborhood you could use variable length paths to get all of the paths away from the node

MATCH path=start_node-[rel*1..3]-(neighbor)
WHERE ID(start_node) = 1859988
RETURN path, nodes(path) AS nodes, EXTRACT(rel IN rels(path) | rel.weight) AS weights;

Then you can take the path / nodes and combine them in memory with your language of choice.

EDIT:

Also take a look at this SO Question: Fetch a tree with Neo4j

It shows how to get the output as a set of start/end nodes for each of the relationships which can be nicer in many cases.

Community
  • 1
  • 1
Brian Underwood
  • 10,746
  • 1
  • 22
  • 34
  • Hi @Brian, thank you for the suggestions about the output, but it does a different query: I would like to return the first N (say LIMIT 6) relationship for each first N neighbors connected to a root node. I also would like to output be ordered by weight. In my first example I am able to return first 6 relationships and ordered for the first iteration (parent - first 6 children); I cannot do a second iteration for each child (parent-first 6 children + child_1-first 6 second_children + ... ) – user305883 Jul 02 '15 at 13:11
  • I looked at: http://graphgist.neo4j.com/#!/gists/a5e27b4dca763b6b60a79ac106a52cbb It featches all trees, I would like to fetch trees between first N nodes of each Node. I tried to adjust: `MATCH p = (o)-[r*0..1]-(x) WHERE ID(o) = 4114904 RETURN collect(DISTINCT id(x)) as nodes, [r in collect(distinct last(r)) | [id(startNode(r)),id(endNode(r)),r.proximity]] as rels` but: QueryExecutionKernelException: Expected ` r@15` to be a Collection but it was a Relationship Also, trying to figure out where to insert `ORDER by r.weight` to limit iteration on first nodes. – user305883 Jul 02 '15 at 13:32
1

Ok, I think I understand. Here's another attempt based on your comment:

MATCH (start_node)-[rel]-(neighbor)
WHERE ID(start_node) IN {source_ids}
WITH
  neighbor, rel
ORDER BY rel.proximity
WITH
  collect({neighbor: neighbor, rel: rel})[0..6] AS neighbors_and_rels
UNWIND neighbors_and_rels AS neighbor_and_rel
WITH
  neighbor_and_rel.neighbor AS neighbor,
  neighbor_and_rel.rel AS rel
MATCH neighbor-[rel2]-(neighbor2)
WITH
  neighbor,
  rel,
  neighbor2,
  rel2
ORDER BY rel.proximity
WITH
  neighbor,
  rel,
  collect([neighbor2, rel2])[0..6] AS neighbors_and_rels2
UNWIND neighbors_and_rels2 AS neighbor_and_rel2
RETURN
  neighbor,
  rel,
  neighbor_and_rel2[0] AS neighbor2,
  neighbor_and_rel2[1] AS rel2

It's a bit long, but hopefully it gives you the idea at least

Brian Underwood
  • 10,746
  • 1
  • 22
  • 34
  • the code return `rel not defined in line 1`. I tried to break down - it doesn't like `ORDER BY rel.proximity` called after collect[]: Same error "rel not defined" here: `MATCH (start_node)-[rel]-(neighbor) WHERE ID(start_node) = 4114904 WITH collect([neighbor, rel])[0..6] AS neighbors_and_rels ORDER BY rel.proximity RETURN neighbors_and_rels;` Code `MATCH (start_node)-[rel]-(neighbor) WHERE ID(start_node) = 4114904 return collect([neighbor, rel])[0..6] AS neighbors_and_rels;` works but conceptually ORDER should sort _before_ fetching the six children `[0..6]` - how to sort a collection? – user305883 Jul 03 '15 at 01:28
  • Ah, right. I've edited and I think that'll work. I just added an extra `WITH` before each which does the sort first and then when they get collected they'll be in order – Brian Underwood Jul 03 '15 at 19:09
  • Hi, almost there, but not yet.. `Type mismatch: neighbor already defined with conflicting type Map (expected Node)` I thought the error was because maybe you can't assign variable names `AS` if names already assigned above; I tried to ensure variable names were distinct, and I have still same error. Maybe `neighbor_and_rel[0] AS neighbor` has a different format from a `(node)` type ? Could you please also clarify if it is possible to reassign variable names if pre-existing in previous `MATCH`? (e.g. `AS neighbor`, `AS rel` ..) – user305883 Jul 04 '15 at 10:03
  • Ok, I ran the query in my console to parse the syntax and got the same error. I refactored it to use a map for `neighbors_and_rels` and that seems to fix it (see edit). I think maybe when you collect an array with a node in it that the node gets converted to a map... – Brian Underwood Jul 06 '15 at 14:48
  • Hi Brian, thank u - i come back today to computer (a bit of vacation!) - it works but please let me better understand your code so I learn: in the first `WITH collect({neighbor: neighbor, rel: rel})[0..6] AS neighbors_and_rels` you use the syntax: `({})`. In the second `WITH .. collect([neighbor2, rel2])[0..6] AS neighbors_and_rels2` you use syntax `([])` - so you return an array as `neighbor_and_rel2[0]` instead of neighbor_and_rel.neighbor. Why this? (I tried to adapt but no success). Also, how does `WHERE ID(start_node) IN {source_ids}` work? can be `IN {..} ` used for an array of nodes ? – user305883 Jul 24 '15 at 14:59
  • Ah, well that was just because I forgot to update the second `WITH`. The reason I changed the first `WITH` is that (I guess) the node/rel gets changed when it gets collected into a 2D array, but not into an object. It doesn't matter so much for the second one probably because you're just returning JSON data in the end and so it's all going to become serialized data in the end anyhow – Brian Underwood Jul 25 '15 at 20:52
  • i see.. and what about `WHERE ID(start_node) IN {source_ids}`? what is the meaning of `IN` clause? does `{}` refers to a range of nodes ? – user305883 Jul 25 '15 at 21:28
  • Sorry, I was going to make a second comment but my son smacked his head whilst jumping on the bed! Basically with `IN` you can do `value IN array`. In this case `{source_ids}` would be the name of the parameter that you'd be passing in separately in the parameters. – Brian Underwood Jul 25 '15 at 21:41
  • Upps!! hope he's fine!! It happens to all the kids soon or later.. happened to me too :D I've read your comment, but it's not working for me: e.g. `START start_node=node(1859988) MATCH (start_node)-[rel]-(neighbor) ...` is ok but ` MATCH (start_node)-[rel]-(neighbor) WHERE ID(start_node) IN {1859988} ... ` is not: `QueryExecutionKernelException: Expected a parameter named 1859988` What do you mean by 'the node/rel gets changed when it gets collected into a 2D array, but not into an object' ? I tried to apply the same WITH syntax in the second one, but no success. it's a bit cumbersome – user305883 Jul 26 '15 at 13:31
  • P.s. - this is completely off topic but I saw where you are travelling .. uhuhuh - enjoy and UNWIND gorgeousness in dominican republic :D Please let me say it, it is summertime... ! and good mood is *essential* part in problem solving and learning to code :) – user305883 Jul 26 '15 at 13:40
  • The `WHERE` could be one of the following `WHERE ID(start_node) = 1859988` or `WHERE ID(start_node) = [1859988]` or `WHERE ID(start_node) = {start_node_id}` or `WHERE ID(start_node) IN {start_node_ids}`. In the last two cases you would pass in a separate parameter. `start_node_id` would be a single integer (1859988) and `start_node_ids` would be an array. I definitely recommend parameters as they are generally more secure and performant. – Brian Underwood Jul 27 '15 at 14:24
  • 1
    Thanks ;) We're enjoying it, though the internet is spotty. Looking forward to our next stop in Argentina – Brian Underwood Jul 27 '15 at 14:26