With Apache Jena Fuseki I am trying to load the latest-truthy.nt dataset from Wikidata, but I am getting the following error while trying to import the file. With the inspiration from the following success from Bitplan where they did have success.
Error log:
14:36:16 INFO loader :: Add: 198.500.000 latest-truthy.nt (Batch: 453.309 / Avg: 213.382)
14:36:17 ERROR riot :: [line: 198884173, col: 87] Bad IRI: <https://abertillerymuseum@btconnect.com> Code: 58/PROHIBITED_COMPONENT_PRESENT in USER: A component that is prohibited by the scheme is present.
org.apache.jena.riot.RiotException: [line: 198884173, col: 87] Bad IRI: <https://abertillerymuseum@btconnect.com> Code: 58/PROHIBITED_COMPONENT_PRESENT in USER: A component that is prohibited by the scheme is present.
at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:146)
at org.apache.jena.riot.system.ParserProfileStd.internalMakeIRI(ParserProfileStd.java:112)
at org.apache.jena.riot.system.ParserProfileStd.resolveIRI(ParserProfileStd.java:85)
at org.apache.jena.riot.system.ParserProfileStd.createURI(ParserProfileStd.java:187)
at org.apache.jena.riot.system.ParserProfileStd.create(ParserProfileStd.java:259)
at org.apache.jena.riot.lang.LangNTriples.tokenAsNode(LangNTriples.java:70)
at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:109)
at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61)
at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:184)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:357)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:323)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:298)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:550)
at org.apache.jena.tdb2.loader.base.LoaderOps.inputFile(LoaderOps.java:107)
at org.apache.jena.tdb2.loader.base.LoaderBase.loadOne(LoaderBase.java:125)
at org.apache.jena.tdb2.loader.base.LoaderBase.lambda$load$0(LoaderBase.java:102)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at org.apache.jena.tdb2.loader.base.LoaderBase.load(LoaderBase.java:99)
at tdb2.tdbloader.lambda$execBulkLoad$4(tdbloader.java:196)
at org.apache.jena.atlas.lib.Timer.time(Timer.java:85)
at tdb2.tdbloader.execBulkLoad(tdbloader.java:194)
at tdb2.tdbloader.loadQuads(tdbloader.java:175)
at tdb2.tdbloader.exec(tdbloader.java:136)
at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
at tdb2.tdbloader.main(tdbloader.java:64)
Script to import:
@ECHO off
cd apache-jena-4.0.0
echo start import on %DATE% %TIME%
tdb2_tdbloader --loader=parallel --loc "C:\fuseki\data" "F:\latest-truthy.nt" > tdb2-out.log 2> tdb2-err.log
echo finish import on %DATE% %TIME%
pause
File structure:
- C:/fuseki/
-- apache-jena-4.0.0/
-- apache-jena-fuseki-4.0.0/
-- data/
-- startfusekidb.bat
-- wikidata2fuseki.bat
- F:/
-- latest-truthy.nt
Is this an issue with Fuseki? I can't open the .nt file myself to remove the issue. Is there any flags I can use so it skips validation for the given import with tdbloader?
I am also asking this in the IRC channel of Wikidata to see if they might be able to help me.
UPDATE: I got answer from someone at IRC and they told me a whole lot of errors exist in the dataset Errors in Wikidata So I know need to find a way to skip error related lines and continue loading. But the Fuseki TDB2 Commands don't show anything of help.
Also trying --help outputs the following, thus indicating skipping doesn't exist?
c:\fuseki\apache-jena-4.0.0\bin>tdb2_tdbloader -h
tdbloader--loader= [--desc DATASET | --loc DIR] FILE ...
Location
--loc=DIR Location (a directory)
--tdb= Assembler description file
--graph=IRI Act on a named graph
--loader= Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or 'light'
--syntax=LANG Syntax of data from stdin
Symbol definition
--set Set a configuration symbol to a value
--mem=FILE Execute on an in-memory TDB database (for testing)
--desc= Assembler description file
General
-v --verbose Verbose
-q --quiet Run with minimal output
--debug Output information for debugging
--help
--version Version information
--strict Operate in strict SPARQL mode (no extensions of any kind)