14

The wiki page, http://wiki.apache.org/solr/DataImportHandler explains how to index data using DataImportHandler. But the example uses a command to initiate the import operation. How can I schedule a job to do this on a regular basis?c

Eldo
  • 391
  • 1
  • 7
  • 16

6 Answers6

14

On UNIX/Linux, cron jobs are your friends! On Windows, there is Task Scheduler.

UPDATE
To do it from Java code, since this is a simple GET request, you can use the HTTP Client library. See this tutorial on using the GetMethod.

If you need to programmatically send other requests to Solr, you probably should use the Solrj library. It allows to send all the basic commands to Solr ant it can be configured to access any Solr handlers:

CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "full-import");
QueryRequest request = new QueryRequest(params);
request.setPath("/dataimport");
server.request(request);
Pascal Dimassimo
  • 6,908
  • 1
  • 37
  • 34
  • Thanks Pascal. My question was different. I need to fire the command http://:/solr/dataimport?command=full-import for the indexing operation. How can I do that using a java class? (as against typing in the command in a browser window?) – Eldo Jul 08 '10 at 17:49
  • 5
    Also, if you are doing it from cron, a wget http://127.0.0.1:8983/solr/dataimport?command=full-import works great! – Eric Pugh Jul 09 '10 at 14:37
7

simple add this line to your crontab with crontab -e command:

0,30 * * * * /usr/bin/wget http://<solr_host>:8983/solr/<core_name>/dataimport?command=full-import 

This will full import every 30 minutes. Replace <solr_host> and <core_name> with your configuration

Daniel Cukier
  • 11,502
  • 15
  • 68
  • 123
7

I was able to make it work following the steps:

  1. Create classes ApplicationListener, HTTPPostScheduler and SolrDataImportProperties (source code listed on http://wiki.apache.org/solr/DataImportHandler#Scheduling). I believe these classes haven't been committed yet.

  2. Add the following listener to Solr web.xml file:

    <listener>
       <listener-class>org.apache.solr.handler.dataimport.scheduler.ApplicationListener</listener-class>
    </listener>
    
  3. Configure dataimport.properties as per instructions in the wiki page.

LeoO
  • 81
  • 1
  • 4
1

This is a bit old, but I created a Windows WPF application and service to deal with this, as using CRON jobs and Task Scheduler is a bit difficult to maintain if you have a lot of cores / environments.

https://github.com/systemidx/SolrScheduler

You basically just drop in a JSON file in a specified folder and it will use a REST client to issue the commands to Solr.

Bryon Weber
  • 106
  • 1
  • 7
1

We can use Quartz to do that, which is like the crontab on linux. But basically, the TimerTask embedded in jdk is enough for you.

Stony
  • 3,541
  • 3
  • 17
  • 23
1

There's a fresh patch by Esteve Fernandez that makes the whole thing work on Unix/Linux: https://issues.apache.org/jira/browse/SOLR-2305

@Eldo If you're going to need more help in building your own JAR just drop a question here...

Marko Bonaci
  • 5,622
  • 2
  • 34
  • 55