Apache Solr Failover Support in Master-Slave Setup

Question

Our development team is currently looking into migrating our search system to Apache Solr, and we would greatly appreciate some advice on setup. We are indexing approximately two hundred million database rows. We add about a hundred thousand new rows throughout the day. These new database rows must be searchable within two minutes of their receipt.

We don't want the indexing to bog down the searcher, so our thought is to have two Solr servers running on different machines in a replication setup. The first Solr instance will be the indexer. It will use the DataImportHandler to index the delta and have autocommit enabled to prevent overzealous commit rates. Index optimization will take place during scheduled periods. The second Solr instance (the slave) will be the primary searcher and will have its indexes stored on RAIDed solid state drives.

What we are concerned about is failover. Our searches are mission-critical. If the primary searcher goes down for whatever reason, our search service will automatically shunt queries over to the indexer node instead. Indexing is equally critical, though. If the indexer dies, we need to have a warm failover standing by. Is there a recommended way to automate master node failover in Solr replication? I've begun looking into ZooKeeper, but I wasn't sure if this was the best approach.

I tried to use repeater as a backup master, but the repeater fails to replicate to it slaves when the primary master is down, can anyone help me out? My post is here (https://stackoverflow.com/questions/49079050/solr-repeater-stops-letting-its-slave-polling-from-it-when-its-master-is-down) — wwood, Mar 06 '18 at 15:50

Johan Sjöberg · Accepted Answer · 2011-06-16T11:08:42.950

14

As you've identified search failover can be handled using replication.

Master failover is a little bit more tricky. One idea to something like the following logical setup

+--------+       +--------+
|  Slave |  ...  |  Slave |
+--------+       +--------+
     |               |
     v (replicate)   v
+---------------------------+
|     Load balancer         |
+---------------------------+
         /         \
        v           v
+--------+       +--------+
| Master | --->  | Master |
+--------+       +--------+

To keep Master indices up to date repeater mode can be used where a hot-backup master can replicate from the primary master
Either
- Use something like the Ping handler on the primary master as a keep-alive notification. If it cannot be reached, write a small programmatic component which triggers the data import-handler of the secondary master to take over.
- Keep the data import handlers active on all master servers, allowing any of them to take over operation without additional configuration.

Note that you might need to configure the load balancer such that a slave can only replicate from one master at any point in time.

On a side note, it would be interesting to hear some of your experiences indexing such a huge data set.

edited Jun 16 '11 at 11:08

answered Jun 16 '11 at 10:41

Johan Sjöberg

47,929
21
130
148

Thanks for your feedback, Johan. The folks over on the Solr mailing list recommended a similar setup. – ikarous Jun 20 '11 at 21:23
1

Indexing such a large number of rows has indeed posed some unique challenges. A full indexing takes at least eight hours, so any schema changes are highly time consuming. Single-query performance is surprisingly good despite the index size, with a few exceptions. Fuzzy searches can sometimes take several seconds to complete, and we initially had problems with date range queries. We have managed to bring query times down on date range queries by 1) reducing the granularity of the indexed field to day-level, and 2) by switching the date field's type TrieDate with a very low precisionstep value. – ikarous Jun 20 '11 at 21:37
It's really interesting to see Solr being pushed this way. Were memory consumption ever a problem for you? – Johan Sjöberg Jun 21 '11 at 19:07
Memory consumption is pretty low during indexing, but it spikes during index optimization when the disk cache fills rapidly. That's just one of the reasons we want indexing to happen away from our searcher instances. I can't yet provide much detail on memory usage for the searchers, but extensive caching can eat up a lot of RAM very quickly. We'll know more once we get additional hardware in place. I'm planning on setting up a master node and a searcher and then replaying a day's worth of searches from our old search system's activity logs. – ikarous Jun 21 '11 at 20:05
Seems like a sound strategy. As you probably know, caching can be tuned quite extensively. A good idea is to completely turn off caching on your master servers, and even sparingly assign it to your searchers. – Johan Sjöberg Jun 21 '11 at 20:30
This is a very interesting approach. I have a question, in my application I have to set one server as in IndexWriter. So in this case I would set one of the Masters. In case this master fails then I would have to manually switch the application to the other? Correct? Any ideas to resolve this manual switching to automatic? – nonouco Feb 02 '12 at 16:52
@nonouco, Load balancers/DNS round robin for instance should do the trick. – Johan Sjöberg Feb 02 '12 at 17:44
@Johan, thank you for response. Please, correct If I'm wrong, basically I could use the Load Balancer as the one in your diagram. The one that the slaves are using. – nonouco Feb 03 '12 at 13:17
@nonouco, yes, since you need some approach to abstract away multiple solr servers into one hostname. – Johan Sjöberg Feb 03 '12 at 14:20
@JohanSjöberg I tried to set up this architecture, but the repeater fails when the primary master is down, can anyone help me out? My post is here (https://stackoverflow.com/questions/49079050/solr-repeater-stops-letting-its-slave-polling-from-it-when-its-master-is-down) – wwood Mar 14 '18 at 16:38

Apache Solr Failover Support in Master-Slave Setup

1 Answers1