Using Solr/Lucene as persistence technology

Question

Solr/Lucene's reverse index and query supports an subset of RDBMS functionalities, i.e. filtering, sorting, groupby, paging. In this sense it is very close to an nosql database as it also does not support transaction and joins.

With framework like Hibernate-Search, it is possible to map even complex objects to the index and perform basic CRUD operations, while supporting full-text search.

Considerations:

1) Write throughput From my past experience, Lucene index's write throughput is much lower than RDBMS

2) Query Speed Query speed for Lucene index should be comparable, if not faster, due to the reverse index.

3) Scalability Could be resolved using replication or Solr-cloud.

4) Ability to handle large data set I have used lucene index with 15M+ document on a single JVM without any performance issue.

Background:

I am currently using MongoDB with Solr and it is working well enough. However, it is not as "simple" as i would like it to be due to:

Keeping mongo and Solr index in sync (not a trivial task)
Transformation between Java object <-> mongo <-> solr (SpringData and SolrJ helps, but still not great).
Why use two "persistence" technology if one will do

From the small scale test I have done so far, I haven't found any technical road block that would prevent me from using Solr/Lucene as persistence. However, I also don't want to commit to such a drastic refactoring without more information. I also aware of projects like Solandra with attempts to bring NoSQl and Solr together, but they don't seem to be mature enough.

Question

So with applications where full-text search is an major (but not the only) requirement, is it then feasible to for-go traditional (RDBMS) and contemporary (NoSQL) data store?

Great Reference Thanks to raticulin

Atlassian (Jira) - Lucene Generic Data Indexing

You might want to clarify what your actual question is... Also, in their latest SOLR 3.5 announcement, they describe SOLR as "morphing into a NoSQL data store". So you can use SOLR as a persistence layer as you would use Cassandra or MongoDB, under the appropriate conditions. There are examples of companies doing this in production environments if you search the SOLR forums. — nickdos, Jan 11 '12 at 03:53
Thanks, I am actually using Mongodb as the persistence right now. One issue With that said, I am interested in the possibility of reducing the number of moving pieces. — ltfishie, Jan 11 '12 at 14:46
@nickdos: Can you point me to some of these discussions? Thanks! — ltfishie, Jan 11 '12 at 14:57
Can't find them ATM, sorry. I looked into this exact issue about 2 years ago (read about it then) and we ended up going for a pure SOLR solution for one of our apps that was a production system. But over time the feature set grew and we ended up using Cassandra in the backend and SOLR for searching. But I still think for a modest system that is search-orientated that a pure-SOLR system is the way to start off (agile approach). Less components and less code == less bugs & easier to maintain. — nickdos, Jan 11 '12 at 23:11

score 2 · Answer 1 · edited May 23 '17 at 10:34

Lucene - Full Text Search/Information Retrieval Library. Solr - Enterprise Search Server built on top of Lucene.

Lucene/Solr should not be used in place of Persistence, neither they will be able to replace RDBMS nor it is a good thing to compare them to RDBMS, you are comparing apples & oranges.

As far index throughput speed of Lucene that you are comparing with RDBMS will not help & it is not right to compare directly, there could be a number of factors that affect Lucene throughput depending on your search schema configurations.
Lucene has one of the well known & best data structures for information retrieval, Query speed that you get depends on number of factors from configuration, HW etc..
Obviously, that's the way to go.
Handling 15M+ on a single JVM is great, but it does not go far without understanding Document size, feature set used, JVM Memory, CPU Cores etc...

Now if your problem is that RDBMS is real scalability bottleneck, you could use pick a NoSQL datastore based on your persistence needs, which you could then with integrate Solr/Lucene to provide full-text search capability. Since NoSQL is rapidly evolving & fairly new you might not find fairly stable adapters to integrate Solr/Lucene with NoSQL.

Edit:

Now that the question is updated, this is already well debated in this question NoSQL (MongoDB) vs Lucene (or Solr) as your database. It could be a pain to have too many moving parts, Lucene/Solr could very well replace MongoDB, depending on app. But you have to consider NoSQL Data Store are built from ground up to be fully distributed, you dont lose or have limited functionality due to scaling, while Solr is not built with Distributed Computing in mind, so there are limitations Distributed Search limitations when it comes horizontal scaling. SolrCloud may be the answer too that..

Thanks for your answer. Currently, I am doing exactly that, using Mongodb for persistence with Solr handling the full text search. However, after examining the capability and performance of Solr/Lucene, I am interested to see if it is do-able, to reduce the number of moving pieces. For your point 1. and 4. I am comparing the performance of Lucene on same hardware with mysql, with an simple schema and basic analyzer to support fulltext search — ltfishie, Jan 11 '12 at 14:43
I updated my question to clarify. I am not looking to replace on DB solution with another, but would like to eliminated from the equation if Solr/Lucene alone would suffice. — ltfishie, Jan 11 '12 at 15:05

score 2 · Accepted Answer · edited Jan 11 '12 at 15:57

2

I think I remember watching some presentation from Atlassian where they explained that for Jira the were using just Lucene nowadays, they had dropped their previous DB (whatever it was) and using Lucene as storage too. They were happy.

If someone can confirm it was them would be cool.

Edit:

http://blogs.atlassian.com/rebelutionary/downloads/tssjs2007-lucene-generic-data-indexing.pdf

edited Jan 11 '12 at 15:57

ltfishie

2,917
6
41
68

answered Jan 11 '12 at 15:11

Persimmonium

15,593
11
47
78

Thanks, do you mean this? http://blogs.atlassian.com/rebelutionary/downloads/tssjs2007-lucene-generic-data-indexing.pdf. I am going over it now to see what it has to say. – ltfishie Jan 11 '12 at 15:27
Thanks so much! This presentation eliminated many of my concerns and prove that it is feasible. – ltfishie Jan 11 '12 at 15:39
that's definitely not true. JIRA still uses a database, Lucene is not suited for this. – Sanne May 16 '13 at 22:27

Using Solr/Lucene as persistence technology

2 Answers2