1. Fault-efficient scalable, flexible and modular design:
- uses simple and modular programming model.
- The system provides servers at high scalability. The system is scalable by adding new nodes to handle larger data.
- Hadoop proves very helpful in storing, managing, processing and analyzing Big Data.
- Modular functions make the system flexible.
- One can add or replace components at ease.
- Modularity allows replacing its components for a different software tool.
2. Robust design of HDFS:
- Execution of Big Data applications continue even when an individual server or cluster fails.
- This is because of Hadoop provisions for backup (due to replications at least three times for each data block) and a data recovery mechanism.
- HDFS thus has high reliability.
3. Store and process Big Data:
- Processes Big Data of 3V characteristics.
4. Distributed clusters computing model with data locality :
- Processes Big Data at high speed as the application tasks and sub-tasks submit to the DataNodes.
- One can achieve more computing power by increasing the number of computing nodes.
- The processing splits across multiple DataNodes (servers), and thus fast processing and aggregated results.
5. Hardware fault-tolerant:
- A fault does not affect data and application processing. If a node goes down, the other nodes take care of the residue.
- This is due to multiple copies of all data blocks which replicate automatically.
- Default is three copies of data blocks.
6. Open-source framework:
- Open source access and cloud services enable large data store. Hadoop uses a cluster of multiple inexpensive servers or the cloud.
7. Java and Linux based:
- Hadoop uses Java interfaces. Hadoop base is Linux but has its own set of shell commands support.