How to efficiently load a rdf file for query

Question

I am using Apache Jena's tdbloader for a query-based web application. My web app selects a particular database (a Turtle file) and loads it. Currently I am using the standard tdbloader to load the selected file. However, it takes about fifteen minutes to load when the dataset is huge. Is there a way to efficiently do the above in less time or load it prior?

(TDB is unrelated to JDBC) The loader does not do anything special in the case of a dataset already having data in it. It optimizes the case of loading an empty database. How big is the TTL file (in triples)? — AndyS, Feb 07 '14 at 16:28

score 1 · Answer 1 · answered Feb 10 '14 at 09:08

Your question really doesn't make much sense.

TDB is a persistent database so if you have a set of known data files you would simply create and load a database from each data file once, most likely offline. Then in your application you just open a TDBDataset for the existing database and go ahead and query it as you would any other dataset with Jena's ARQ API.

It sounds like your application may not be appropriately designed because you imply you are loading the data into a database every time you want to query it which is extremely wasteful.

You may want to read up on the following:

score 1 · Answer 2 · edited May 23 '17 at 10:25

I think you may be interested in these questions and their answers

Querying large RDF Datasets out of memory
Querying Open Data Communities Data with SPARQL (see the second half of my answer)

TDB stores data on disk in a much more efficient format than the plain RDF files. You should be loading the data with tdbloader once, and then running the query against the on-disk representation that tdbloader produced. You could do that with tdbquery (as my answer to the second of those questions described).

How to efficiently load a rdf file for query

2 Answers2