1

There is a case when an external application should send a unknown number of different indexing requests to SOLR. In fact, those requests should be processed by SOLR Data Import Handlers according to the config submitted inside the request.

There is a SOLR constraint - only one indexing request can be processed by a particular DIH. Because the number of the requests can be quite large and they arrive in parallel, it is inpractical to define multiple DIH specifications in the solrconfig.xml.

How that problem can be overcome ?

May be SOLR provides some admin API to create DIH specifications dynamically from a client ?

Eduard BABKIN
  • 71
  • 1
  • 5

1 Answers1

1

The best way to do this is to create a layer outside of Solr that handles your import tasks. Using DIH will limit what you can do (as you've discovered), and will be hard to make work properly in parallel across multiple nodes and indexing services (it's designed for a far simpler scenario).

Using a simple queue (Redis, Celery, ApacheMQ, whatever fits your selection of languages and technology) that the external application can put requests into and that your indexing workers pick up tasks from will be scalable and customizable. It'll allow you to build out onto multiple index nodes as the number of tasks grow, and it'll allow you to pull data from multiple sources as necessary (and apply caching if required).

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • Thank you for a good solution, I see only one shortcoming in it --- I should reimplement a nice feature of DIH config -- a XML-based declarative definition of sql data mapping to entities in data-config. I wonder to know is any way to reuse that feature ? – Eduard BABKIN Mar 14 '20 at 17:33
  • You could write a small XML-based parser that reads the `` entries. As long as you're not using (many) transformers it should be pretty straight forward. Otherwise I'd spend the time rewriting it to something suitable for your application (and possibly with re-use across import definitions). You can include DIH configuration options (such as connection details) etc. through the URL when making the request, but it'll still be single threaded. – MatsLindh Mar 14 '20 at 20:33
  • >> You can include DIH configuration options (such as connection details) etc. through
    >> the URL when making the request
    that what I supposed to do.
    Your suggestion to make an own XML parser supports my original ideas, although I expected that there is a more easy solution.
    – Eduard BABKIN Mar 15 '20 at 10:27
  • See https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#dih-request-parameters for how you can do that. You'll still have to work with the other limitations of DIH, but it might be enough. – MatsLindh Mar 15 '20 at 21:23
  • Are there any objections to using an experimental SOLR config API for Add-Delete Request Handlers in run-time ? That API allows to specify all needed parameters for a newly created Request Handler or delete a previously created handler. So, I may expect that the external program can send a request to create a new Request Handler with a unique name, start data import on that handler, and later delete that request handler. – Eduard BABKIN Mar 18 '20 at 08:28