3

I have the following query:

START e1=node:event(prop="0")
MATCH e1-[r:rbel]->e2
WITH e1, e2, count(e1) as ecount
MATCH e1-[:redge]->p<-[:redge]-e2
WITH p.element_type as Type, p.label as Label, (count(p)*100./ecount) as percentage
WHERE percentage > 20
RETURN Type, Label, ROUND(percentage) as Percentage

I am trying to calculate the percentage of times the specified pattern occurs in events with prop="0", over all patterns occurring in those events.

I receive the following error: Unknown identifier 'ecount'

So I replaced ecount in the calculation with count(ecount), and that consistently yielded percentages of 100%, which I know not to be true.

Am I going about this wrong? How can I carry the value of ecount to the WITH clause and use it in calculation?

Any help is appreciated!

Michael Hunger
  • 41,339
  • 3
  • 57
  • 80
zanbri
  • 5,958
  • 2
  • 31
  • 41

2 Answers2

4

Does this query work for you? Whenever I combine e1 and count(e1) in a WITH statement, the count(e1) is always 1. I think that is because the count(e1) aggregation does not work any more when you select e1, too. Either you leave out the e1 or the count(e1).

START e1=node:event(prop="0")
MATCH e1-[r:rbel]->e2
WITH e1, e2
MATCH e1-[:redge]->p<-[:redge]-e2
WITH p.element_type as Type, p.label as Label, (count(p)*100./count(e1)) as percentage
WHERE percentage > 20
RETURN Type, Label, ROUND(percentage) as Percentage

UPDATE After playing around with your provided console setup I got the following query working:

START e1=node:node_auto_index(prop="0") 
MATCH e1-[r:rbel]->e2 
WITH COLLECT(e2) AS e2collection, count(e1) AS cnt 
MATCH e1-[:redge]->p<-[:redge]-(e2) 
WITH p, COLLECT(e1) AS e1collection, cnt, e2collection 
WITH p.name AS Name, cnt, count(p)*100/cnt AS Percentage 
WHERE Percentage > 20 
RETURN Name, Percentage
Henrik Sachse
  • 51,228
  • 7
  • 46
  • 59
  • 2
    Your pattern matcher is always selecting triangle relationships. I created a test data set on the console here: http://console.neo4j.org/?id=wxx099 Aggregating the other nodes with a counter will always result in 2 as plotted here: http://console.neo4j.org/?id=qsvane So this calculation is basically correct even if it is not what you want to achieve: http://console.neo4j.org/?id=ekxwmu Can you please describe your goal bit clearer including a console sample of the data structure if possible? – Henrik Sachse Aug 27 '13 at 18:17
  • I initially want to calculate how many nodes fit my first match: http://console.neo4j.org/r/yt15uw (I created a slightly different graph which is more similar to my use case). Here it returns 3. Then I want to have this: http://console.neo4j.org/r/5yfaov . This returns the result I expect: 33%. However, I had to manually insert the '3' in the calculation. What I think you're proposing is this: http://console.neo4j.org/r/6bz813 , which consistently gives me results of 100%, which are wrong. Do you know how to carry that '3' as a variable? Many thanks for your help! – zanbri Aug 27 '13 at 20:42
  • Yes! Thank you!! Do you know Why collections allow for you to pass the variables? Many thanks! – zanbri Aug 28 '13 at 15:38
  • Because when you wouldn't use collections you calculate it for each possible path separately. That would always result in 100%. – Henrik Sachse Aug 28 '13 at 16:00
  • Although your code works perfectly in the console setup, it fails to work on my localhost databrowser.. I get: "Invalid query: All parts of the pattern must either directly or indirectly be connected to at least one bound entity. These identifiers were found to be disconnected: UNNAMED2, UNNAMED3, e1, e2, p" – zanbri Aug 28 '13 at 16:02
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/36404/discussion-between-zanbri-and-h3nrik) – zanbri Aug 28 '13 at 16:03
1

h3nrik's solution worked perfectly in the console setup example below, however, for some reason it failed to work when applied to my actual data in my localhost data browser. I found the following work-around, despite having a slower query time:

START e1=node:event(prop="0") 
MATCH e1-[:rbel]->e2 
WITH count(e1) as ecount
START e1=node:event(prop="0") 
MATCH e1-[:rbel]->e2, e1-[:redge]->p<-[:redge]-(e2)
WITH p.label AS Label, p.element_type as Type, ecount, count(p)*100./ecount AS percentage 
WHERE percentage > 20 
RETURN Label, Type, ROUND(percentage) as Percentage
zanbri
  • 5,958
  • 2
  • 31
  • 41