How I can load a model in TDB TripleStore

Question

I have a question for you:

I would like to load a file on my Jena TDB TripleStore. My file is very big, about 80Mb and about 700000 triples RDF. When I try to load it, the execution stops working or takes a very long time.

I'm using this code that I do run on a Web Service:

        String file = "C:\\file.nt";
        String directory;
        directory = "C:\\tdb";
        Dataset dataset = TDBFactory.createDataset(directory);

        Model model = ModelFactory.createDefaultModel();

        TDBLoader.loadModel(model, file );
        dataset.addNamedModel("http://nameFile", model); 

        return model;

Sometimes I get an error of Java heap space:

Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:170)
    at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:86)
    at org.apache.jena.atlas.iterator.PeekIterator.fill(PeekIterator.java:50)
    at org.apache.jena.atlas.iterator.PeekIterator.next(PeekIterator.java:92)
    at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:99)
    at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:67)
    at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:54)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
    at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTFactoryImpl$1.read(RDFParserRegistry.java:142)
    at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:859)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:255)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:241)
    at org.apache.jena.riot.adapters.RDFReaderRIOT_Web.read(RDFReaderRIOT_Web.java:96)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:241)
    at com.hp.hpl.jena.tdb.TDBLoader.loadAnything(TDBLoader.java:294)
    at com.hp.hpl.jena.tdb.TDBLoader.loadModel(TDBLoader.java:125)
    at com.hp.hpl.jena.tdb.TDBLoader.loadModel(TDBLoader.java:119)

How I can load this file in a model Jena and save it in TDB? Thanks in advance.

score 2 · Accepted Answer · edited May 23 '17 at 12:08

You need to allocate more memory for your JVM at statup. When you have too little, the process will spend too much time performing garbage collection, and will ultimately fail.

For example, start your JVM with 4 GB of memory by:

java -Xms4G -XmxG

If you are in an IDE such as Eclipse, you can change your run configuration so that the application has additional memory as well.

Aside from that, the only change that jumps out at me is that you are using an in-memory model for the actual loading operation, when you can actually use a model backed by TDB instead. This can help to alleviate your memory problems because TDB dynamially moves its indexes to disk.

Change:

Dataset dataset = TDBFactory.createDataset(directory);
Model model = ModelFactory.createDefaultModel();
TDBLoader.loadModel(model, file );
dataset.addNamedModel("http://nameFile", model);

to this:

Dataset dataset = TDBFactory.createDataset(directory);
Model model = dataset.getNamedModel("http://nameFile");
TDBLoader.loadModel(model, file );

Now your system depends on TDB's ability to make good decisions about when to leave data in memory and when to flush it to disk.

I think the answer is simply the second part -- skip the in memory model and you won't need to bump the heap allocation. — user205512, Sep 15 '14 at 15:11
Rob, this solution is good, but I have noticed a error that I have reported in this post: http://stackoverflow.com/questions/25865560/error-when-i-load-rdf-triples-in-tdb-triple-store?noredirect=1#comment40476748_25865560 — Musich87, Sep 16 '14 at 11:55

How I can load a model in TDB TripleStore

1 Answers1

Linked