0

I'm stuck trying to set up Nutch 2.3 with Elasticsearch 5.4. The problem is in Nutch as I cannot get it to inject my urls. The hadoop log shows the following warning:

Console:

aurora apache-nutch-2.3.1 # runtime/local/bin/nutch inject urls/seed.txt
InjectorJob: starting at 2017-06-14 17:08:28
InjectorJob: Injecting urlDir: urls/seed.txt

** it hangs here**

and the

Hadoop log:

aurora apache-nutch-2.3.1 # cat runtime/local/logs/hadoop.log 
2017-06-14 17:08:28,339 INFO  crawl.InjectorJob - InjectorJob: starting at 2017-06-14 17:08:28
2017-06-14 17:08:28,340 INFO  crawl.InjectorJob - InjectorJob: Injecting urlDir: urls/seed.txt
2017-06-14 17:08:28,992 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

I've tried setting my Hadoop environment variables following this thread (Hadoop "Unable to load native-hadoop library for your platform" warning) but I'm still getting the same error.

Any ideas?

1 Answers1

0
  1. Don't worry about warning. And I believe you are running on a Linux distribution
  2. Nutch2.3 is not compatible with ES 5.x. I had written a custom IndexWriter which invokes Logstash at given port which in turn invokes Elastic Search. You may try this approach or something around it.
Ram Dwivedi
  • 470
  • 3
  • 11
  • OK, thanks. Do you have a working setup of these two? – Emily T Jun 26 '17 at 17:16
  • I used Oracle Virtual box and ran ubuntu on top of it. Did all the changes there. For logstash custom code, you can take any of the indexer coming with 2.3 distribution and alter that as per your need. I'll try to provide a sample to you, give me sometime. – Ram Dwivedi Jun 28 '17 at 09:45