6

i want to store some data in my neo4j database. i use spring-data-neo4j for that.

my code is like the follow:

    for (int i = 0; i < newRisks.size(); i++) {
        myRepository.save(newRisks.get(i));
        System.out.println("saved " + newRisks.get(i).name);
    }

My newRisks-array contains circa 60000 objects and 60000 edges. Every node and edge has one property. The duration of this loop is circa 15 - 20 minutes, is this normal? I used Java VisualVM to search some bottlenecks, but my average CPU usage was 10 - 25% (of 4 cores) and my heap was less than half full.

There are any options to boost up this operation?


EDIT: additional is, on the first call of myRepository.save(newRisks.get(i)); the jvm falling assleep fpr some minutes before the first output is comming

Second EDIT:

Class Risk:

@NodeEntity
public class Risk {
    //...
    @Indexed
    public String name;

    @RelatedTo(type = "CHILD", direction = Direction.OUTGOING)
    Set<Risk> risk = new HashSet<Risk>();

    public void addChild(Risk child) {
        risk.add(child);
    }

    //...
}

Creating Risks:

@Autowired
private Repository myRepository;

@Transactional
public Collection<Risk> makeSomeRisks() {

    ArrayList<Risk> newRisks = new ArrayList<Risk>();

    newRisks.add(new Risk("Root"));

    for (int i = 0; i < 60000; i++) {
        Risk risk = new Risk("risk " + (i + 1));
        newRisks.get(0).addChild(risk);
        newRisks.add(risk);
    }

    for (int i = 0; i < newRisks.size(); i++) {
        myRepository.save(newRisks.get(i));
    }

    return newRisks;
}
hilbert
  • 266
  • 4
  • 15
  • According to the [documentation](http://static.springsource.org/spring-data/data-neo4j/docs/2.0.0.RC1/api/org/springframework/data/neo4j/repository/CRUDRepository.html), CRUDRepository.save can take an Iterable as an argument. Why not just myRepository.save(newRisks)? – Thomas Mar 05 '12 at 14:51
  • i tried this and it works also. but, its not faster. so i can see, he is not dead ^^ – hilbert Mar 05 '12 at 14:54
  • could you show the structure of your class and any node-entities and relationship-entities that it refers to. And the method of construction of your Risk instances? – Michael Hunger Mar 06 '12 at 07:48
  • also what is your transactional boundary? It should be an @Transactional around your method, or probably a Transaction-Template that commits around every 10k objects. Otherwise this will create one tx per object which is LOTS of overhead. – Michael Hunger Mar 06 '12 at 08:01
  • @Michael Hunger thanks for the additional questions, i added the informations under "Second EDIT:" in my post – hilbert Mar 06 '12 at 13:02

4 Answers4

5

I think I've found a solution:

I tried the same insert using the nativ neo4j java API:

GraphDatabaseService graphDb;
Node firstNode;
Node secondNode;
Relationship relationship;

graphDb = new EmbeddedGraphDatabase(DB_PATH);
Transaction tx = graphDb.beginTx();

try {
    firstNode = graphDb.createNode();
    firstNode.setProperty( "name", "Root" );

    for (int i = 0; i < 60000; i++) {
        secondNode = graphDb.createNode();
        secondNode.setProperty( "name", "risk " + (i+1));

        relationship = firstNode.createRelationshipTo( secondNode, RelTypes.CHILD );
    }
    tx.success();
}
finally {
    tx.finish();
    graphDb.shutdown();
}

the result: after some sconds, the database is filled with risks.

Maybe the reflections slow down this routine with spring-data-neo4j. @Michael Hunger says somthing like that in his book GoodRelationships, thanks for that tip.

hilbert
  • 266
  • 4
  • 15
5

The problem here is that you are doing mass-inserts with an API that is not intended for that.

You create a Risk and 60k children, you first save the root which also persists the 60k children at the same time (and creates the relationships). That's why the first save takes so long. And then you save the children again.

There are some solutions to speed it up with SDN.

  1. don't use the collection approach for mass inserts, persist both participants and use template.createRelationshipBetween(root, child, "CHILD",false);

  2. persist the children first then add all the persisted children to the root object and persist that

  3. As you did, use the Neo4j-Core API but call template.postEntityCreation(node,Risk.class) so that you can access the entities via SDN. Then you also have to index the entities on your own (db.index.forNodes("Risk").add(node,"name",name);) (or use the neo4j core-api auto-index, but that's not compatible with SDN).

  4. Regardless with the core-api or SDN you should use tx-sizes of around 10-20k nodes/rels for best performance

Michael Hunger
  • 41,339
  • 3
  • 57
  • 80
  • thanks you're right. For the first I just save the root, it takes just 6 minutes. Maybe later i have to try the other solutions that you propose. – hilbert Mar 12 '12 at 07:55
1

I faced the same problem as OP. Really useful in my case was to change Neo4j usage from remote server mode to embedded. Good example of embedded SDN usage could be found here.

ytterrr
  • 3,036
  • 6
  • 23
  • 32
1

Do inserts into your database (outside of Java) have the same delay or is this a problem only through spring data?

abehrens
  • 185
  • 1
  • 6
  • good idea, but how can i insert this count of notes etc out of java? i dont want to take an other programming language. An other idea: i can try to use the standard neo4j java api, not spring-data-neo4j. – hilbert Mar 06 '12 at 07:05
  • Just fire an insert statement in your SQL editor of choice. This would test the speed of the database itself, outside of any programming language. – abehrens Mar 07 '12 at 17:28
  • its a noSQL database, but i've tried it with the natic java api for neo4j, thats faster – hilbert Mar 08 '12 at 08:44