Why do we need Zookeeper in the Hadoop?


Distributed applications are difficult to coordinate and work with as they are much more error prone due to huge number of machines attached to network. As many machines are involved, race condition and deadlocks are common problems when implementing distributed applications. Race condition occurs when a machine tries to perform two or more operations at a time and this can be taken care by serialization property of ZooKeeper. Deadlocks are when two or more machines try to access same shared resource at the same time. More precisely they try to access each other’s resources which leads to lock of system as none of the system is releasing the resource but waiting for other system to release it. Synchronization in Zookeeper helps to solve the deadlock. Another major issue with distributed application can be partial failure of process, which can lead to inconsistency of data. Zookeeper handles this through atomicity, which means either whole of the process will finish or nothing will persist after failure. Thus Zookeeper is an important part of Hadoop that take care of these small but important issues so that developer can focus more on functionality of the application.



Share to whatsapp

More Questions from Big Data Analytics Module 0

Why do we need Zookeeper in the Hadoop?


View

How is Ambari different from ZooKeeper?


View

What is Apache Ambari?


View

Apache Ambari Architecture


View

What is grid computing? List and explain the features, drawbacks of grid computing.


View

What is Apache Zookeeper?


View

Explain MongoDB and it's Features.


View