I am using Apache Jena's tdbloader
for a query-based web application. My web app selects a particular database (a Turtle file) and loads it. Currently I am using the standard tdbloader
to load the selected file. However, it takes about fifteen minutes to load when the dataset is huge.
Is there a way to efficiently do the above in less time or load it prior?

- 84,998
- 9
- 154
- 353

- 637
- 2
- 7
- 23
-
1(TDB is unrelated to JDBC) The loader does not do anything special in the case of a dataset already having data in it. It optimizes the case of loading an empty database. How big is the TTL file (in triples)? – AndyS Feb 07 '14 at 16:28
2 Answers
Your question really doesn't make much sense.
TDB is a persistent database so if you have a set of known data files you would simply create and load a database from each data file once, most likely offline. Then in your application you just open a TDBDataset
for the existing database and go ahead and query it as you would any other dataset with Jena's ARQ API.
It sounds like your application may not be appropriately designed because you imply you are loading the data into a database every time you want to query it which is extremely wasteful.
You may want to read up on the following:

- 28,022
- 11
- 77
- 119
I think you may be interested in these questions and their answers
- Querying large RDF Datasets out of memory
- Querying Open Data Communities Data with SPARQL (see the second half of my answer)
TDB stores data on disk in a much more efficient format than the plain RDF files. You should be loading the data with tdbloader
once, and then running the query against the on-disk representation that tdbloader
produced. You could do that with tdbquery
(as my answer to the second of those questions described).

- 1
- 1

- 84,998
- 9
- 154
- 353