We need control of the data in the production solr index and we need it to be compatible with new development. Ideally, we'd like to mock the index on local machines, query with it solr and write unit tests to query it for quicker iterations.
RamDirectory is used in another question to do something similar but the question is from 2 years back. This example appears to do just that (using FSDirectory instead of RamDirectory). Are these the right approaches to this problem? Are there better ways to do this?
We'd like to write tests like:
setup mock index;
query mock index;
assert(stuff that should be true);
teardown mock index;
EDIT: Additional details:
Our thought was we would build an index, have a simple way of adding documents without needing the indexer and the rest of the system, except perhaps a local database that we could keep in version control. In the past we generated an index and when incompatibilities arose, we regenerated it.
If we re-index, we're adding in a lot of overhead, and mocking the indexer doesn't seem like a good option given that our indexer contains a lot of data processing logic (like adding data to searchable fields from a db). Our indexer connects to an external db so we'd need to support that too. We could have a local test database as stated above which has little no overhead.
Once we have a test db, we need to build an index and then we could go off the second link above. The question becomes how do we build an index really quickly for testing, say of the size 1000 documents.
The problem with this is we then need to keep our local db schema in sync with the production schema. The production schema changes often enough that this is a problem. We'd like to have a test infrastructure that's flexible enough to handle this- the approach as of now is just rebuild the database each time which is slow and pisses off other people!