Why am I getting "GC overhead limit exceeded" when I use "arq" to query local rdf files

Question

I am using ARQ in order to query local RDF files. The command that I am using is the following:

./arq --data /home/datasets/a-m-00027.nt --results CSV --query myQuery.sparql

myQuery.sparql contains the query:

PREFIX basekb:<http://rdf.basekb.com/ns/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?x
FROM  </home/data/a-m-00027.nt>
WHERE {?x rdf:type basekb:music.release} 
LIMIT 10

Exception

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:959)
        at com.hp.hpl.jena.graph.impl.SimpleEventManager.notifyAddTriple(SimpleEventManager.java:97)
        at com.hp.hpl.jena.graph.impl.GraphBase.notifyAdd(GraphBase.java:124)
        at com.hp.hpl.jena.graph.impl.GraphBase.add(GraphBase.java:203)
        at com.hp.hpl.jena.sparql.core.DatasetGraphCollection.add(DatasetGraphCollection.java:43)
        at com.hp.hpl.jena.sparql.core.DatasetGraphBase.add(DatasetGraphBase.java:82)
        at org.apache.jena.riot.system.StreamRDFLib$ParserOutputDataset.triple(StreamRDFLib.java:206)
        at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:61)
        at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
        at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:185)
        at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:906)
        at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:687)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:534)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:501)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:454)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:432)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:422)
        at arq.cmdline.ModDatasetGeneral.addGraphs(ModDatasetGeneral.java:98)
        at arq.cmdline.ModDatasetGeneral.createDataset(ModDatasetGeneral.java:87)
        at arq.cmdline.ModDatasetGeneralAssembler.createDataset(ModDatasetGeneralAssembler.java:35)
        at arq.cmdline.ModDataset.getDataset(ModDataset.java:34)
        at arq.query.getDataset(query.java:176)
        at arq.query.queryExec(query.java:198)
        at arq.query.exec(query.java:159)
        at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
        at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
        at arq.arq.main(arq.java:28)

Fact example

<http://rdf.basekb.com/ns/architecture.building_complex>        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://rdf.basekb.com/ns/type.type>

Is the whole file being loaded into memory?

user205512 · Answer 1 · 2015-05-18T14:26:54.353

Is the whole file being loaded into memory?

Exactly, here is your issue. You may be able to bump the java heap and get it to fit, as has been said.

But as an alternative, or for cases where you simply don't have enough memory, try using TDB to store and index the file then query it:

$ tdbloader --loc my_tdb_store /home/datasets/a-m-00027.nt
$ tdbquery --loc my_tdb_store --results CSV --query myQuery.sparql

(You can delete the store once you've finished, it is just a directory named my_tdb_store)

As a third alternative, you can skip sparql completely. You query just finds the first ten things with type basekb:music.release, which you can find like this:

$ riot /home/datasets/a-m-00027.nt | \
  grep '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.basekb.com/ns/music.release> .' | \
  cut -d ' ' -f 1 | \
  head -10

which uses minimal memory.

score 3 · Accepted Answer · edited May 23 '17 at 11:58

3

cause you're out of memory as the exception tells you:

java.lang.OutOfMemoryError: GC overhead limit exceeded

It is well possible that you're actually not out of memory, but that it's just your JVM settings that won't exceed a certain amount of memory by default. As described in https://stackoverflow.com/a/21197787/1423333 try running

JVM_ARGS="-Xmx4096M" ./arq --data /home/datasets/a-m-00027.nt --results CSV --query myQuery.sparql

edited May 23 '17 at 11:58

Community

1
1

answered May 18 '15 at 13:23

Jörn Hees

3,338
22
44

I got the following error: ** Unrecognized option: --Xmx4G Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit** – Hani Goc May 18 '15 at 13:24
sorry, try with --Xmx4096M instead – Jörn Hees May 18 '15 at 13:25
i updated the answer, as it seems the suffix G isn't supported. – Jörn Hees May 18 '15 at 13:30
I think that it should be JVM_ARGS='-Xmx4096M'. I did the following: $export JVM_ARGS='-Xmx4096M' $./arq --data /home/datasets/a-m-00027.nt --results CSV --query myQuery.sparql it's running waiting for results lol let me see – Hani Goc May 18 '15 at 13:33
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/78093/discussion-between-jorn-hees-and-hani-goc). – Jörn Hees May 18 '15 at 13:36

Why am I getting "GC overhead limit exceeded" when I use "arq" to query local rdf files

Exception

Fact example

2 Answers2