Apache nutch 1.5 and solr 4.7 indexing

Question

I have crawled websites using apache nutch and want to index the data in solr. I have been following the tutorial mentioned here However the tutorial mentions about indexing as it crawls except in my case I need to index the data that already has been crawled.

I am running the below command

bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*

[abc@xyz nutch-crawler]$ bin/nutch index http://abc.xyz:8983/solr/ pryder/crawldb/ -linkdb pryder/linkdb/ pryder/segments/20140330021243/
Indexer: starting at 2014-04-02 20:34:09
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/solr/client/solrj/impl/CommonsHttpSolrServer
    at java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2398)
    at java.lang.Class.getConstructor0(Class.java:2708)
    at java.lang.Class.newInstance0(Class.java:328)
    at java.lang.Class.newInstance(Class.java:310)
    at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:157)
    at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
Caused by: java.lang.ClassNotFoundException: org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    ... 11 more

What would be going wrong here?

Just a thought, seems like u have some miss-match in versions CommonsHttpSolrServer I believe is SOLR 3.x not in 4.x — Itay Moav -Malimovka, Apr 03 '14 at 01:48
check here, see if it helps or points you to the right direction http://stackoverflow.com/questions/13987920/solr-change-commonshttpsolrserver-to-httpsolrserver — Itay Moav -Malimovka, Apr 03 '14 at 01:49
Did you follow this tutorial completely? It looks like a classpath issue. Did you write anything, I mean source code? — Mysterion, Apr 03 '14 at 07:45
No nothing at all. Infact I think this is an issue from the nutch command — Ajay Nair, Apr 03 '14 at 17:55

Apache nutch 1.5 and solr 4.7 indexing

0 Answers0