0

I am trying to use Jena's read method to read big size datasets(more than 1 gb) yet I am receiving out of memory error. I tried increasing tomcat heapsize (-Xmx parameter) up to 2048, also the same parameter in eclipse.ini file. However I wasn't able to get a working solution I am open to any suggestions on how to handle the large datasets since I will be parsing the datasets in to hashmaps and display contents on a webpage.

console error is below:

Exception in thread "http-bio-8080-AsyncTimeout" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.ConcurrentLinkedQueue.iterator(ConcurrentLinkedQueue.java:667)
    at org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:157)
    at java.lang.Thread.run(Thread.java:745)
Exception in thread "http-bio-8080-exec-6" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:959)
    at com.hp.hpl.jena.graph.impl.SimpleEventManager.notifyAddTriple(SimpleEventManager.java:91)
    at com.hp.hpl.jena.graph.impl.GraphBase.notifyAdd(GraphBase.java:124)
    at com.hp.hpl.jena.graph.impl.GraphBase.add(GraphBase.java:203)
    at org.apache.jena.riot.system.StreamRDFLib$ParserOutputGraph.triple(StreamRDFLib.java:165)
    at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:56)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
    at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:182)
    at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:906)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:257)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:243)
    at org.apache.jena.riot.adapters.RDFReaderRIOT_Web.read(RDFReaderRIOT_Web.java:96)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:235)
    at com.packages.rdf.FileAnalyse.GetFileComponents(FileAnalyse.java:77)
    at com.packages.servlets.CreatePatternServlet.GetStatements(CreatePatternServlet.java:96)
    at com.packages.servlets.CreatePatternServlet.doPost(CreatePatternServlet.java:68)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070)

Exception in thread "ContainerBackgroundProcessor[StandardEngine[Catalina]]" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.apache.naming.resources.FileDirContext.file(FileDirContext.java:765)
    at org.apache.naming.resources.FileDirContext.doGetAttributes(FileDirContext.java:398)
    at org.apache.naming.resources.BaseDirContext.getAttributes(BaseDirContext.java:1137)
    at org.apache.naming.resources.BaseDirContext.getAttributes(BaseDirContext.java:1090)
    at org.apache.naming.resources.ProxyDirContext.getAttributes(ProxyDirContext.java:882)
    at org.apache.catalina.loader.WebappClassLoader.modified(WebappClassLoader.java:1026)
    at org.apache.catalina.loader.WebappLoader.modified(WebappLoader.java:500)
    at org.apache.catalina.loader.WebappLoader.backgroundProcess(WebappLoader.java:420)
    at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1345)
    at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1546)
    at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1556)
    at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1556)
    at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1524)
    at java.lang.Thread.run(Thread.java:745)
Exception in thread "http-bio-8080-exec-6" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:170)
    at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:86)
    at org.apache.jena.atlas.iterator.PeekIterator.fill(PeekIterator.java:50)
    at org.apache.jena.atlas.iterator.PeekIterator.next(PeekIterator.java:92)
    at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:99)
    at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:71)
    at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:54)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
    at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:182)
    at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:906)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:257)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:243)
    at org.apache.jena.riot.adapters.RDFReaderRIOT_Web.read(RDFReaderRIOT_Web.java:96)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:235)
    at com.packages.rdf.FileAnalyse.GetFileComponents(FileAnalyse.java:77)
    at com.packages.servlets.CreatePatternServlet.GetStatements(CreatePatternServlet.java:96)
    at com.packages.servlets.CreatePatternServlet.doPost(CreatePatternServlet.java:68)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
emrahozkan
  • 193
  • 1
  • 3
  • 15
  • Just because your file is 1 GB doesn't mean you'll only use up 1GB in memory - it's often much more, for various reasons. Can you try upping your heap size to something a lot higher, like 8 or 16 GB? (By the way, you don't have to write '2048M' - you can just write '2G' and it does the same amount) – childofsoong Mar 18 '15 at 22:11
  • I tried up to 4g but for this application I might have to analyse file at 10 gb, so I don't know if it will be enough – emrahozkan Mar 18 '15 at 22:13
  • Then you might have to find a library that doesn't try to load the entire file into memory at once. I'm afraid I don't know of anything that does that off the top of my head. However, could your analysis perhaps be run on multiple smaller files, and then put together? If so, you could break the giant file up into smaller ones. – childofsoong Mar 18 '15 at 22:19

1 Answers1

1

see this one: GC overhead limit exceeded


I think you should definitely customize GC. Go through oracle article about gc implementations and perhaps youll have some progress there.

Community
  • 1
  • 1
Radek
  • 86
  • 6