1

I have a question for you:

I would like to load a file on my Jena TDB TripleStore. My file is very big, about 80Mb and about 700000 triples RDF. When I try to load it, the execution stops working or takes a very long time.

I'm using this code that I do run on a Web Service:

        String file = "C:\\file.nt";
        String directory;
        directory = "C:\\tdb";
        Dataset dataset = TDBFactory.createDataset(directory);

        Model model = ModelFactory.createDefaultModel();

        TDBLoader.loadModel(model, file );
        dataset.addNamedModel("http://nameFile", model); 

        return model;

Sometimes I get an error of Java heap space:

Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:170)
    at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:86)
    at org.apache.jena.atlas.iterator.PeekIterator.fill(PeekIterator.java:50)
    at org.apache.jena.atlas.iterator.PeekIterator.next(PeekIterator.java:92)
    at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:99)
    at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:67)
    at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:54)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
    at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTFactoryImpl$1.read(RDFParserRegistry.java:142)
    at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:859)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:255)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:241)
    at org.apache.jena.riot.adapters.RDFReaderRIOT_Web.read(RDFReaderRIOT_Web.java:96)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:241)
    at com.hp.hpl.jena.tdb.TDBLoader.loadAnything(TDBLoader.java:294)
    at com.hp.hpl.jena.tdb.TDBLoader.loadModel(TDBLoader.java:125)
    at com.hp.hpl.jena.tdb.TDBLoader.loadModel(TDBLoader.java:119)

How I can load this file in a model Jena and save it in TDB? Thanks in advance.

Musich87
  • 562
  • 1
  • 12
  • 31

1 Answers1

2

You need to allocate more memory for your JVM at statup. When you have too little, the process will spend too much time performing garbage collection, and will ultimately fail.

For example, start your JVM with 4 GB of memory by:

java -Xms4G -XmxG

If you are in an IDE such as Eclipse, you can change your run configuration so that the application has additional memory as well.

Aside from that, the only change that jumps out at me is that you are using an in-memory model for the actual loading operation, when you can actually use a model backed by TDB instead. This can help to alleviate your memory problems because TDB dynamially moves its indexes to disk.

Change:

Dataset dataset = TDBFactory.createDataset(directory);
Model model = ModelFactory.createDefaultModel();
TDBLoader.loadModel(model, file );
dataset.addNamedModel("http://nameFile", model);

to this:

Dataset dataset = TDBFactory.createDataset(directory);
Model model = dataset.getNamedModel("http://nameFile");
TDBLoader.loadModel(model, file );

Now your system depends on TDB's ability to make good decisions about when to leave data in memory and when to flush it to disk.

Community
  • 1
  • 1
Rob Hall
  • 2,693
  • 16
  • 22
  • 3
    I think the answer is simply the second part -- skip the in memory model and you won't need to bump the heap allocation. – user205512 Sep 15 '14 at 15:11
  • Thanks Rob, I have done this modify and now it works! – Musich87 Sep 15 '14 at 15:41
  • 1
    Rob, this solution is good, but I have noticed a error that I have reported in this post: http://stackoverflow.com/questions/25865560/error-when-i-load-rdf-triples-in-tdb-triple-store?noredirect=1#comment40476748_25865560 – Musich87 Sep 16 '14 at 11:55