28

Some colleagues of mine have a large Java web app that uses a search system built with Lucene Java. What I'd like to do is have a nice HTTP-based API to access those existing search indexes. I've used Nutch before and really liked how simple the OpenSearch implementation made it to grab results as RSS.

I've tried setting Solr's dataDir in solrconfig.xml, hoping it would happily pick up the existing index files, but it seems to just ignore them.

My main question is:

Can Solr be used to access Lucene indexes created elsewhere? Or might there be a better solution?

  • possible duplicate: http://stackoverflow.com/questions/2195404/very-basic-dude-with-solr-lucene – Mauricio Scheffer Apr 26 '10 at 19:12
  • Thanks for the heads up. Unfortunately nobody has given this approach a thumbs-up or a thumbs-down yet... –  Apr 27 '10 at 02:35
  • a follow up question, is it possible to load Lucene index that using non-default codec into Solr, like SimpleTextCodec? – B.Mr.W. Mar 10 '16 at 04:39

4 Answers4

29

Success! With Pascal's suggestion of changes to schema.xml I got it working in no time. Thanks!

Here are my complete steps for anyone interested:

  1. Downloaded Solr and copied dist/apache-solr-1.4.0.war to tomcat/webapps
  2. Copied example/solr/conf to /usr/local/solr/
  3. Copied pre-existing Lucene index files to /usr/local/solr/data/index
  4. Set solr.home to /usr/local/solr
  5. In solrconfig.xml, changed dataDir to /usr/local/solr/data (Solr looks for the index directory inside)
  6. Loaded my Lucene indexes into Luke for browsing (awesome tool)
  7. In the example schema.xml, removed all fields and field types except for "string"
  8. In the example schema.xml, added 14 field definitions corresponding to the 14 fields shown in Luke. Example: <field name="docId" type="string" indexed="true" stored="true"/>
  9. In the example schema.xml, changed uniqueKey to the field in my index that seemed to be a document id
  10. In the example schema.xml, changed defaultSearchField to the field in my index that seemed to contain terms
  11. Started tomcat, saw no exceptions finally, and successfully ran some queries in localhost:8080/solr/admin

This is just proof for me that it can work. Obviously there's a lot more configuration to be done.

  • This worked perfectly for me using Solr 5.2.0 with the addition that I had to specify, in solrconfig.xml that I was not using a managed schema: `` Not bad for a 5 year old answer! – tgood Jul 14 '15 at 19:35
11

I have never tried this, but you would have to adjust the schema.xml to include all the fields of the documents that are in your Lucene index, because Solr won't allow you to search for a field if it is not defined in schema.xml.

The adjustment to schema.xml should also include defining the query-time analyzers to properly search in your field, especially if the field where indexed using custom analyzers.

In solrconfig.xml you may have to change settings in the indexDefaults and the mainIndex sections.

But I'd be happy to read answers from people who actually did it.

Pascal Dimassimo
  • 6,908
  • 1
  • 37
  • 34
  • I'm looking at the index using Luke and it's not terribly complex. There are 14 fields, all typed as strings. I'll give the configuration you suggested a try and report back. Thanks! –  Apr 27 '10 at 02:29
  • luke is your friend here :) – Aman Tandon Apr 08 '17 at 07:29
1

Three steps in the end:

  1. Change schema.xml or (managed-schema)
  2. Change <dataDir> in solrconfig.xml
  3. Restart Solr

I have my study notes here for those who are new to Solr, like me :)
To generate some lucene indexes yourself, you can use my code here.

public class LuceneIndex {
    private static Directory directory;

    public static void main(String[] args) throws IOException {
        long startTime = System.currentTimeMillis();

        // open
        Path path = Paths.get("/tmp/myindex/index");
        directory = new SimpleFSDirectory(path);
        IndexWriter writer = getWriter();

        // index
        int documentCount = 10000000;
        List<String> fieldNames = Arrays.asList("id", "manu");

        FieldType myFieldType = new FieldType();
        myFieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
        myFieldType.setOmitNorms(true);
        myFieldType.setStored(true);
        myFieldType.setTokenized(true);
        myFieldType.freeze();

        for (int i = 0; i < documentCount; i++) {
            Document doc = new Document();
            for (int j = 0; j < fieldNames.size(); j++) {
                doc.add(new Field(fieldNames.get(j), fieldNames.get(j) + Integer.toString(i), myFieldType));
            }
            writer.addDocument(doc);
        }
        // close
        writer.close();
        System.out.println("Finished Indexing");
        long estimatedTime = System.currentTimeMillis() - startTime;
        System.out.println(estimatedTime);
    }
    private static IndexWriter getWriter() throws IOException {
        return new IndexWriter(directory, new IndexWriterConfig(new WhitespaceAnalyzer()));
    }
}
B.Mr.W.
  • 18,910
  • 35
  • 114
  • 178
0
I am trying the same steps with HDF as the home directory and locktype as HDFS but no luck. I see the below error

labs_shard1_replica1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Index dir 'hdfs://127.0.0.1/user/solr/labs/core_node1/data/index/' of core 'labs_shard1_replica1' is already locked. The most likely cause is another Solr server (or another solr core in this server) also configured to use this directory; other possible causes may be specific to lockType: hdfs

solar dir config

<directoryFactory name="DirectoryFactory"

class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">

but not with HDFS as below

<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
                <str name="solr.hdfs.home">hdfs://127.0.0.1/user/solr</str>
                <bool name="solr.hdfs.blockcache.enabled">true</bool>
                <int name="solr.hdfs.blockcache.slab.count">1</int>
                <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
                <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
                <bool name="solr.hdfs.blockcache.read.enabled">true</bool>
                <bool name="solr.hdfs.blockcache.write.enabled">false</bool>
                <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
                <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
                <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
            </directoryFactory>

Lock type hdfs

  • 1
    if you have a problem that is not solved by these answers please [ask a new question](https://stackoverflow.com/questions/ask) – Tim Penner Mar 24 '16 at 21:18