I'm trying to index the English Wikipedia, around 40Gb, but it's not working. I've followed the tutorial at http://wiki.apache.org/solr/DataImportHandler#Configuring_DataSources and other related Stackoverflow questions like Indexing wikipedia with solr and Indexing wikipedia dump with solr.
I was able to import the wikipedia (simple english), about 150k documents, and Portuguese wikipedia (more than 1 million documents) using the configuration explained in the tutorial. The problem is happening when I try to index the English Wikipedia (more than 8 million documents). It gives the follow error:
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457)
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:410)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:539)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408)
... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.ParallelPostingsArray.<init>(ParallelPostingsArray.java:34)
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:254)
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:279)
at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:307)
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:324)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)
at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:165)
at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:217)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:569)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:705)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:504)
... 6 more
I'm using a MacBook pro with 4Gb RAM and more than 120Gb of free space in the HD. I've already tried to change the the 256 in the solrconfig.xml, but no success up to now.
Does anyone could help me, please?
Edited
Just in case, if someone has the same problem, I've used the command java Xmx1g -jar star.jar
suggested by Cheffe to solve my problem.