0

I'm currently working on my first application that uses a Graph database (Neo4J). I'm in the process of modelling my graph on a whiteboard. My colleague and I are in a pickle on whether or not we should introduce a 'collection node'.

We have something like this (Cypher syntax, Fictive example): (parking:Parking) - Parking node (car:Car) - Car node

Obviously, a Parking can have multiple Cars, let's say it can have up to 1mio cars.

Is it, in this case, better to introduce a new node: (carCollection:CarCollection) - Car collection node?

A Parking could have a rel to the 'Car collection node' which can have a lot of cars. This should avoid a simple query being performed on the Parking node it self (let's say you want to query the number of available seats) to lose performance. Is this a good idea? Or is this bogus and should you model it as it is, and does this not influence performance?

If anyone can provide a link or book with some graph modelling best practices, that would be awesome as well :).

Thx in advance.

Gr Kwinten

KwintenP
  • 4,637
  • 2
  • 22
  • 30
  • what do you mean under `car collection` node? another node, which will contain all relations to the car nodes? – ulkas Jul 21 '14 at 12:40
  • Yes ulkas, that's exactly what I mean. This to avoid to much relations on the Parking node which could hurt performance when you want to query something other than the 'hasCar' relationship. – KwintenP Jul 21 '14 at 12:43
  • it will not hurt the performance, the graph db concept is exactly made for this - easily have millions of data and query just a small subset without impact on performance – ulkas Jul 21 '14 at 12:49
  • how did you solved this? curios to know – All Іѕ Vаиітy Apr 18 '17 at 13:03
  • It's been a really long time and tbh honest I don't remember anymore :( – KwintenP Apr 18 '17 at 17:04

1 Answers1

1

anyhow, there is no way of a performance enhancer once you need to have 1mil nodes for each car.

if you will simply query your parking node with just one car, it will be as fast as if you have just 1 car in the car collection.

if you will need to return all 1 mil cars, than there is no enhancer. (the main problem, however, would be simply the net connection to stream all the data).

you can play with labels, but i suggest to keep the millions of relations directly to the parking node. but if you could provide us with an example scenario with a query, than we can figure maybe smthnig out

ulkas
  • 5,748
  • 5
  • 33
  • 47
  • Thx for you answer. Let's say a Parking can have, besides a (parking:Parking)<-[:PARKED_AT]-(car:CAR) rel, a (parking:Parking)<-[:OWNER_OF]-(person:Person). If you wish to query for the owner of the parking, doesn't Neo4J have to loop over every rel (so also the 1mio car rels) to find the OWNER_OF rel? This must hurt performance doesn't it? – KwintenP Jul 22 '14 at 11:39
  • no. imagine that every group of relationships (OWNER_OF, PARKED_AT) is saved to disk in a separate file. when you make a match like `(parking:Parking)<-[:OWNER_OF]-(person:Person)`, the core simply loads data from the specific relationships of the file for OWNER_OF, thus completely ignoring all other types of relationships. the same principle applies for `labels`, where the compiler picks up only nodes from the specific label. – ulkas Jul 22 '14 at 12:16
  • even better, when you specify (for example by an index or direct id) a starting node, like `START parking=node(123456) MATCH (parking)<-[:OWNER_OF]-(car)`, it will never loop through all the relationships of `OWNER_OF`. it will just jump into the node, and continue traversing from the node. thus actually working just with those few the relations of node `parking` – ulkas Jul 22 '14 at 12:16