1

I have a local version of LinkedMDB that is in N-Triples format and want to query it. Now, I want to use Jena TDB, which can store the data that can be used for querying later. I checked the documentation for TDB Java API, but was unable to load the N-Triples file and then query with SPARQL. I've used the following code:

String directory = "E:\\Applications\\tdb-0.8.9\\TDB-0.8.9\\bin\\tdb";
        Dataset dataset = TDBFactory.createDataset(directory);

        // assume we want the default model, or we could get a named model here
        Model tdb = dataset.getDefaultModel();

        // read the input file - only needs to be done once
        String source = "E:\\Applications\\linkedmdb-18-05-2009-dump.nt";
        FileManager.get().readModel( tdb, source, "N-TRIPLES" );

and got the following Exception

Exception in thread "main" com.hp.hpl.jena.tdb.base.file.FileException: Not a directory: E:\Applications\tdb-0.8.9\TDB-0.8.9\bin\tdb
    at com.hp.hpl.jena.tdb.base.file.Location.<init>(Location.java:83)
    at com.hp.hpl.jena.tdb.TDBFactory.createDataset(TDBFactory.java:79)
    at tutorial.Temp.main(Temp.java:14)
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
ProgramME
  • 653
  • 3
  • 12
  • 23
  • If the directory `E:\Applications\tdb-0.8.9\TDB-0.8.9\bin\tdb` does not yet exist you will have to create it first. But, you probably should be storing data in a directory other than the place you installed TDB to (i.e. something specific to your application). Consider what will happen when you download a future release of TDB, for example. – Ian Dickinson Apr 13 '11 at 09:40

3 Answers3

3

You don't need any java code to do this (tdbloader2 is faster):

bin/tdbloader2 --loc /path/to/tdb/store imdb.nt

will load in the n-triple file. You can query it using:

bin/tdbquery --loc /path/to/tdb/store "select ...."

More information on the tdb command line tools here.

double-beep
  • 5,031
  • 17
  • 33
  • 41
user205512
  • 8,798
  • 29
  • 28
  • i am building an application that queries the nt file.Is there any java code to implement this.i cannot do this from command line all the time – ProgramME Apr 11 '11 at 17:20
  • when i tried tdbloader i got :'tdbloader' is not recognized as an internal or external command, operable program or batch file – ProgramME Apr 11 '11 at 17:43
3

Reading into a TDB-backed Model from Java is straightforward, see the TDB wiki for details. For example, you could:

// open TDB dataset
String directory = "./tdb";
Dataset dataset = TDBFactory.createDataset(directory);

// assume we want the default model, or we could get a named model here
Model tdb = dataset.getDefaultModel();

// read the input file - only needs to be done once
String source = "path/to/input.nt";
FileManager.get().readModel( tdb, source, "N-TRIPLES" );

// run a query
String q = "select * where {?s ?p ?o} limit 10";
Query query = QueryFactory.create(q);
QueryExecution qexec = QueryExecutionFactory.create(query, tdb);
ResultSet results = qexec.execSelect();
... etc ...

As user205512 mentioned, you can use tdbloader2 from the command line on a Linux or Mac, which will be faster on large RDF files. Once the TDB indexes have been created, you can copy the files to other machines. So you can load the data on a Linux server, then ship all the files inside the tdb directory to your Windows machine to continue development.

To run tdbloader from the command line on your Windows machine, you'll need something like cygwin to allow you to run Unix-style scripts. You'll also need to set the environment variable TDBROOT.

TylerH
  • 20,799
  • 66
  • 75
  • 101
Ian Dickinson
  • 12,875
  • 11
  • 40
  • 67
  • so,firstly i need to use tdbloader2 to load the file and then use the code you provided to query it – ProgramME Apr 13 '11 at 07:51
  • Using tdbloader/tdbloader2 is an alternative to the "read the input file" step in the code sample above. You can do it either way; you don't have to do both. – Ian Dickinson Apr 13 '11 at 09:38
  • Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(Unknown Source) at java.nio.ByteBuffer.allocate(Unknown Source)i got the following exception: – ProgramME Apr 13 '11 at 14:27
  • You either need to start a new question for this new topic, or, and I suggest this would be better, send an email to the Jena users list (see http://incubator.apache.org/jena/contributing.html for info on how to subscribe). Either way, you'll have to show your code: there's no way to diagnose that exception otherwise (the proximal cause is clear: you are running out of heap space, but the root cause - *why* you are running out of heap space needs more information to diagnose). – Ian Dickinson Apr 13 '11 at 15:07
  • I think that's because the inkedmdb-18-05-2009-dump.nt file size is 450mb – ProgramME Apr 13 '11 at 16:21
  • If you're reading directly into TDB, it should not matter how big the source file is. But without seeing the code, I can't say for sure. The recommendation stands though: please start a new question or post your code to the Jena users list. – Ian Dickinson Apr 13 '11 at 22:54
0

Assuming that "nt format" is really "N-Triple", then the Jena Model.read(is, base, lang) method will load N-Triple format if lang is "N-Triple".

For more details, refer to the Jena tutorial document.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • i know this but the problem is that file is 850 mb in size that causes heap overflow exception.so,i wanted to store the file data in tdb – ProgramME Apr 11 '11 at 17:25