Hadoop ZooKeeper, is a distributed application that follows a simple client-server model where clients are nodes that make use of the service, and servers are nodes that provide the service. Multiple server nodes are collectively called ZooKeeper ensemble. At any given time, one ZooKeeper client is connected to at least one ZooKeeper server. A master node is dynamically chosen in consensus within the ensemble; thus usually, an ensemble of Zookeeper is an odd number so that there is a majority of vote. If the master node fails, another master is chosen in no time and it takes over the previous master. Other than master and slaves there are also observers in Zookeeper. Observers were brought in to address the issue of scaling. With the addition of slaves the write performance is going to be affected as voting process is expensive. So observers are slaves that do not take part into voting process but have similar duties as other slaves.
Writes in Zookeeper
All the writes in Zookeeper go through the Master node, thus it is guaranteed that all writes will be sequential. On performing write operation to the Zookeeper, each server attached to that client persists the data along with master. Thus, this makes all the servers updated about the data. However this also means that concurrent writes cannot be made. Linear writes guarantee can be problematic if Zookeeper is used for write dominant workload. Zookeeper in Hadoop, is ideally used for coordinating message exchanges between clients, which involves less writes and more reads. Zookeeper is helpful till the time the data is shared but if application has concurrent data writing then Zookeeper can come in way of the application and impose strict ordering of operations.
Reads in Zookeeper
Zookeeper is best at reads as reads can be concurrent. Concurrent reads are done as each client is attached to different server and all clients can read from the servers simultaneously, although having concurrent reads leads to eventual consistency as master is not involved. There can be cases where client may have an outdated view, which gets updated with a little delay.