0
 Model model = ModelFactory.createDefaultModel();
 InputStream in = FileManager.get().open( "W:\\structure.rdf.u8" );
 model.read(in, null);
 model.write(System.out);

I use the above code, which is provided in the Jena documentation, to parse the ODP. First it gave some exception, so I added all the jar files in the Jena package and got the following long exception:

    log4j:WARN No appenders could be found for logger (org.apache.jena.riot.system.stream.JenaIOEnvironment).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.jena.riot.RiotException: [line: 5, col: 5 ] {E201} The attributes on this property element, are not permitted with any content; expecting end element tag.
    at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:128)
    at org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.error(LangRDFXML.java:246)
    at org.apache.jena.rdfxml.xmlinput.impl.ARPSaxErrorHandler.error(ARPSaxErrorHandler.java:37)
    at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:196)
    at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:173)
    at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:168)
    at org.apache.jena.rdfxml.xmlinput.impl.ParserSupport.warning(ParserSupport.java:194)
    at org.apache.jena.rdfxml.xmlinput.states.Frame.warning(Frame.java:55)
    at org.apache.jena.rdfxml.xmlinput.states.WantEmpty.characters(WantEmpty.java:33)
    at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.characters(XMLHandler.java:137)
    at org.apache.xerces.parsers.AbstractSAXParser.characters(Unknown Source)
    at org.apache.xerces.impl.XMLNamespaceBinder.characters(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

`

I don't know if I need to remove some of the jar files to fix this or the code provided in the Apache site is wrong?

lonesome
  • 2,503
  • 6
  • 35
  • 61

1 Answers1

0

It's not legal RDF/XML; close but has errors. (at least the one http://rdf.dmoz.org/rdf/structure.rdf.u8.gz isn't)

  1. Top level 'RDF' is not the RDF marker RDF, it's http://dmoz.org/rdf/RDF It should be r:RDF but then ...
  2. The r namespace is wrong (should be http://www.w3.org/1999/02/22-rdf-syntax-ns#, not http://www.w3.org/TR/RDF/).
AndyS
  • 16,345
  • 17
  • 21
  • so, how come all search engines are using this ODP with no problem? – lonesome Aug 06 '15 at 14:42
  • to be honest, did not get your answer. – lonesome Aug 06 '15 at 14:46
  • do you mean that Jena is not following standard RDF? – lonesome Aug 06 '15 at 14:54
  • Maybe search engines use the file as XML, not RDF. – AndyS Aug 07 '15 at 13:43
  • I described the problems relative to the specification. – AndyS Aug 07 '15 at 13:47
  • If so, then how can I use same file as XML? – lonesome Aug 08 '15 at 08:09
  • About the answer, you mentioned things that are blur to me. For example, you are talking `r` name space which i can not find it. – lonesome Aug 08 '15 at 08:11
  • The file structure.rdf.u8.gz has xmlns:r= defined at the top. – AndyS Aug 09 '15 at 09:16
  • How could you even open such heavy file in notepad? every time i tried to open it, it hanged. (used more than 1.7Gb system's memory) but for my own information, I want to know whether or not if I replace your correction with the faulty ones in the file would fix the problem? – lonesome Aug 09 '15 at 15:51
  • I opened a lighter version of similar file, replaced `http://www.w3.org/1999/02/22-rdf-syntax-ns#` with `http://www.w3.org/TR/RDF/`. but about #1, I cannot get it. the `xmlns="http://dmoz.org/rdf/"` should be changed to what? – lonesome Aug 09 '15 at 16:37
  • I still did not get an answer for `How to use the XML version and not the RDF` if it is the case for search engines to use same RDF with no problem. – lonesome Aug 09 '15 at 16:39
  • I looked in the file with "gzip -d < structure.rdf.u8.gz | head -100". – AndyS Aug 10 '15 at 12:03
  • See above: It should be r:RDF (and the same change at the end of the file. – AndyS Aug 10 '15 at 12:05
  • You have to use the RDF namespace `xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#"` then `r:RDF`. – AndyS Aug 12 '15 at 14:05