VTU Semester -7 Subject -Big Data Analytics

Why do we need Zookeeper in the Hadoop?

Distributed applications are difficult to coordinate and work with as they are much more error prone due to huge number of machines attached to network. As many machines are involved, race condition and deadlocks are common problems when implementing distributed applications. Race condition occurs when a machine tries to perform two or more operations at a time and this can be taken care by serialization property of ZooKeeper. Deadlocks are when two or more machines try to access same shared resource at the same time. More precisely they try to access each other’s resources which leads to lock of system as none of the system is releasing the resource but waiting for other system to release it. Synchronization in Zookeeper helps to solve the deadlock. Another major issue with distributed application can be partial failure of process, which can lead to inconsistency of data. Zookeeper handles this through atomicity, which means either whole of the process will finish or nothing will persist after failure. Thus Zookeeper is an important part of Hadoop that take care of these small but important issues so that developer can focus more on functionality of the application.

Share to whatsapp

Why do we need Zookeeper in the Hadoop?

More Questions from Big Data Analytics Module 0