122

I've been looking at Zookeeper recently and wondered whether anybody was using it currently and what they were specifically using it for storing.

The most common use case is for configuration information, but what kind of data and how much data are you storing?

Lorenzo Belli
  • 1,767
  • 4
  • 25
  • 46
Jonathan Holloway
  • 62,090
  • 32
  • 125
  • 150
  • 3
    It comes under the Hadoop group of technologies, there's a use case from Yahoo here that's quite good - http://developer.yahoo.net/blogs/hadoop/2009/05/using_zookeeper_to_tame_system.html – Jonathan Holloway Sep 25 '09 at 20:57
  • 10
    This question has more upvotes than all answers combined. Zookeeper needs a better usecase wiki. – mixdev Apr 28 '13 at 05:23
  • 1
    Check out how Netflix's uses it. https://github.com/Netflix/curator/wiki/Recipes Curator is Netflix's wrapper libary for ZK . – eSniff Oct 28 '13 at 18:04
  • Check this article : https://www.stackextend.com/zookeeper/centralized-configuration-with-apache-zookeeper/ – Mouad EL Fakir Jan 16 '18 at 14:39
  • I don't know the specifics of how it's used, but I know that the latest version of [HBase](http://hadoop.apache.org/hbase/) (an open source BigTable implementation) uses ZooKeeper. – Leo P Sep 25 '09 at 22:54

13 Answers13

19

Free Software Projects Powered by ZooKeeper:

Apache Projects Powered by ZooKeeper:

Source: https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy

dln385
  • 11,630
  • 12
  • 48
  • 58
  • Do you think that Apache Zookeeper can be used for executing the consensus as an external system as it is explained in the following question? https://stackoverflow.com/q/70088996/5029509 – Questioner Nov 30 '21 at 15:31
17

HBase uses Zookeeper for coordinating activities its "head node" was responsible for prior to the current version. The move to using Zookeeper means the central control is no longer a single point of failure.

Zookeeper is very versatile; here is an example of using it to build a distributed concurrent queue:

http://blog.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/

You can of course also use it to create resource locks, etc, in a distributed system.

tharindu_DG
  • 8,900
  • 6
  • 52
  • 64
SquareCog
  • 19,421
  • 8
  • 49
  • 63
14

Old question, but since this page comes up first on a google search for zookeeper use cases, I figured it would be best to give an updated listing

  1. wikipedia
  2. zookeeper wiki
  3. real users
manku
  • 1,268
  • 10
  • 9
13

The Apache CXF implementation of DOSGi uses zookeeper for its service registration repository. Individual containers have a distributed software (dsw) bundle that listens for all service events and when a service status changes that has a property indicating distribution. The dsw talks to the discovery bundle which, in the reference implementation case, uses zookeeper to store service as ephemeral nodes. Other instances will look for changes to the node structure and register proxies on their local systems. The end result is you can code to plain OSGi and end up with transparent distribution.

John Ellinwood
  • 14,291
  • 7
  • 38
  • 48
10

Norbert is a good example from a scalable production system. I general, it integrates Netty, Protocol Buffers and Zookeeper into a lightweight framework for running clustered services. Protocol Buffers are used to specify your service API, Netty implements transport-layer abstractions and Zookeeper is essentially a fault-tolerant discovery service.

Every time a service instance is started Norbert registers it as available instance of a particular service type. From implementation perspective, it creates two Zookeeper trees:

  • "/ServiceName/members" which lists all known instances of the service
  • "/ServiceName/available" which lists currently available instances of the service

The most important property for each node is the url to use to connect to the corresponding service instance. It enables client-side load balancing - a Norbert client finds the list of urls for a given service name and attempt to connect to one of them is some order (e.g. round-robin or random).

ndolgov
  • 131
  • 1
  • 3
6

There is a good article ZooKeeper - The King of Coordination about ZooKeeper at Elastic Cloud.

At Found, for example, we use ZooKeeper extensively for discovery, resource allocation, leader election and high priority notifications. In this article, we'll introduce you to this King of Coordination and look closely at how we use ZooKeeper at Found

Mike Doe
  • 16,349
  • 11
  • 65
  • 88
herodot
  • 2,347
  • 1
  • 18
  • 10
5

Solr is also working to integrate ZooKeeper. Here you can see they are using for dynamic config, sharding, SPOF elimination (master/slave election), rebalancing, etc.

Rob Hruska
  • 118,520
  • 32
  • 167
  • 192
phunt
  • 520
  • 3
  • 1
3
  • Storm is used by a number of companies (Twitter and Groupon being two of the better known) and relies on Zookeeper.
  • Kafka is used by Linkedin and relies on Zookeeper.

Storm uses Zookeeper to store all state so that it can recover from an outage in any of its (distributed) component services.

This allows the component services to be stateless and simply download or sync with the Zookeeper servers when configuration data is needed. If you have ever had to recover a production server you will know what a headache this can be!

Kafka queue consumers can use Zookeeper to store information (high water mark) on what has been consumed from the queue.

Thomas Bratt
  • 48,038
  • 36
  • 121
  • 139
2

Zookeeper was used for many things other than configuration. Here is a official list of implement distributed primitives using zookeeper.

https://zookeeper.apache.org/doc/current/recipes.html

Valtoni Boaventura
  • 1,519
  • 12
  • 13
coder4
  • 319
  • 2
  • 4
2

In my case we are storing configuration files in zookeeper ensemble for cluster usage . We are using leader -> follower schema . So when one zookeeper down we are switched for another one (replicated mode)

1

Neo4j uses Zookeeper their High Availability enterprise server! http://docs.neo4j.org/chunked/milestone/ha.html

John Russell
  • 1,115
  • 1
  • 15
  • 30
1

datomic uses apache zookeeper to manage riak based data storage.

Because Riak supports only eventual consistency at this time, a Datomic system running on Riak also utilizes Apache ZooKeeper, a highly-available coordination service. Datomic uses ZooKeeper for transactor failover coordination, and for the handful of keys per database that need to be updated with CAS. source: http://blog.datomic.com/2012/11/riak-and-couchbase-support.html

mavbozo
  • 1,161
  • 8
  • 10
0

Here's some detail on how HBase uses ZooKeeper, including information on how they intend to use it in future. Generally they use it for eliminating SPOF on the region servers via Leader election implemented using ZooKeeper.

Rob Hruska
  • 118,520
  • 32
  • 167
  • 192
phunt
  • 520
  • 3
  • 1