4

I'm developing a website that will use Cassandra for database storage and Solr to index and search some of the data contained in that database (only some of the data do I want searchable). I had intended to use PHP for server-side scripting, interfacing with the Cassandra database, and providing dynamic HTML content based on the contents of the database.

When a user commits something to the database, I envisioned PHP issuing the write to Cassandra, and if it were data that needed to be searched, that same data could be written to the Solr index. The thing is, I don't necessarily need the searchable data immediately available in the Solr index, nor do I want the process of adding it to the index through PHP consuming valuable resources, especially during peak traffic hours. Is there a way to have asynchronous updates to the Solr index occur in the background by transferring the data directly from Cassandra? Perhaps a queue of searchable data could be created that is used to update the Solr index during idle time by some background process?

I'm new to this whole thing, but I'd somehow like the link between Cassandra and Solr be insulated from the main PHP scripts. Not sure if Cassandra and Solr can be linked efficiently by Java, with only the higher-level access to both Cassandra (for reading/writing to the database) and Solr (for querying the searchable data) be maintained in PHP for web content creation. I appreciate any suggestions.

onlinespending
  • 1,079
  • 2
  • 12
  • 20

2 Answers2

5

Rather than operating Solr and Cassandra separately, You should consider Solandra, a cassandra backend for solr.

Read more about it here: http://github.com/tjake/Lucandra

tjake
  • 506
  • 2
  • 3
  • I had looked briefly at Solandra, but I have one main concern about using it. I don't really need real-time search (although I'd certainly take it for free), and I'm worried that Solandra is essentially doing immediate commits to the Solr index. I imagine this could significantly delay search queries when there are a lot of commits happening simultaneously. – onlinespending Feb 01 '11 at 15:27
  • 2
    This is no longer true, there is a setting for keeping cached readers around for a minimum amount of time. – tjake Feb 02 '11 at 01:12
  • OK, thanks. I'll definitely take a look at this then. I imagine there is a performance gain from having Cassandra and Solr being so tightly coupled and running under a single JVM. Is this true? Are there any cases where performance suffers with Solandra versus running Cassandra and Solr individually? – onlinespending Feb 02 '11 at 01:59
  • Solandra repository https://github.com/tjake/Solandra has last commit in 2012, is it still actual? – Michal Jan 03 '16 at 10:53
0

You have lots of options.

One simple one is to have a scheduled job, that grabs all your updates since the time of the last job run and do a batch insertion into solr.

Or you could do your cassandra post and then issue an async post to solr. as described here: How do I make an asynchronous GET request in PHP?

Since you don't need real time search, you could set a default commit size to be fairly large as well.

Community
  • 1
  • 1
bdargan
  • 1,309
  • 8
  • 16