Mocking and Unit Testing Solr and Lucene Index

Question

We need control of the data in the production solr index and we need it to be compatible with new development. Ideally, we'd like to mock the index on local machines, query with it solr and write unit tests to query it for quicker iterations.

RamDirectory is used in another question to do something similar but the question is from 2 years back. This example appears to do just that (using FSDirectory instead of RamDirectory). Are these the right approaches to this problem? Are there better ways to do this?

We'd like to write tests like:

setup mock index;
query mock index;
assert(stuff that should be true);
teardown mock index;

EDIT: Additional details:

Our thought was we would build an index, have a simple way of adding documents without needing the indexer and the rest of the system, except perhaps a local database that we could keep in version control. In the past we generated an index and when incompatibilities arose, we regenerated it.

If we re-index, we're adding in a lot of overhead, and mocking the indexer doesn't seem like a good option given that our indexer contains a lot of data processing logic (like adding data to searchable fields from a db). Our indexer connects to an external db so we'd need to support that too. We could have a local test database as stated above which has little no overhead.

Once we have a test db, we need to build an index and then we could go off the second link above. The question becomes how do we build an index really quickly for testing, say of the size 1000 documents.

The problem with this is we then need to keep our local db schema in sync with the production schema. The production schema changes often enough that this is a problem. We'd like to have a test infrastructure that's flexible enough to handle this- the approach as of now is just rebuild the database each time which is slow and pisses off other people!

What database are you using... my guess is its MySQL which is notorious for slow backups and restores. We switched to Postgresql because of that. SQLServer also has fast backup/restore. — Adam Gent, Jul 28 '11 at 01:53
We were talking about this a bit today and one possibility seems to just do SELECT * on the db and load it into a hash so that there's never a schema problem locally. Columns almost never get removed, and the unit tests should work fine if columns are missing/underspecified (for creating docs). — nflacco, Jul 28 '11 at 07:01
haha, fortunately I'm not involved in any way shape or form with the db end, and for better or for worse it's staying oracle (and for reasons beyond my control) — nflacco, Jul 29 '11 at 15:40

Adam Gent · Accepted Answer · 2011-07-28T01:52:06.270

If you are using Solr I wouldn't even bother with mocking or emulating (ie don't change its config).

Instead write an integration test that sets up your solr index. The setting up would be to just to index the data like you normally would. You will probably want your developers to run their own solr.

I wouldn't worry that much about speed because solr indexes incredible fast (100,000 documents in less than 30 seconds for our environment... infact the bottle neck is pulling the data from the database).

So really your mock index should just be a small subset of production data that you will index into solr (you can do this once for each TestCase class with @BeforeClass).

EDIT (based on your Edits):

I'll tell you how we do it (and how I have seen others do it):

We have a development schema/db and production schema/db. When developers are working on stuff they just make a copy of the "build machines" development database and restore it locally. This database is much smaller than the production db and is ideal for testing. Your production db should no be that much different than your development db schema wise (make smaller changes and release more often if it is the case.)

Indexing the data as we normally do takes 2+ hours because we have millions of records! Our indexer has lots of processing logic, so we'd prefer not to run it. We don't need production data; just data to test various functionality and performance. Furthermore we want to control this dataset, similar to the 'example' link in the original question. This example used the 'LiaTestCase' which loads a local index that's already pre-populated. Is it workable to build the index off a local db? — nflacco, Jul 27 '11 at 20:58

Mocking and Unit Testing Solr and Lucene Index

1 Answers1