Apache Hbase is a popular and highly efficient Column-oriented NoSQL database built on top of Hadoop Distributed File System that allows performing read/write operations on large datasets in real time using Key/Value data.
Apache HBase is an Apache Hadoop project and Open Source, non-relational distributed Hadoop database that had its genesis in the Google’s Bigtable. The programming language of HBase is Java. Today it is an integral part of the Apache Software Foundation and the Hadoop ecosystem. It is a high availability database that exclusively runs on top of the HDFS and provides the Capabilities of Google Bigtable for the Hadoop framework for storing huge volumes of unstructured data at breakneck speeds in order to derive valuable insights from it.
It has an extremely fault-tolerant way of storing data and is extremely good for storing sparse data. Sparse data is something like looking for a needle in a haystack. A real-life example of sparse data would be like looking for someone who has spent over $100,000 dollars in a single transaction on Amazon among the tens of millions of transactions that happen on any given week.
Criteria | HBase |
Cluster basis | Hadoop |
Deployed for | Batch Jobs |
API | Thrift or REST |
The Apache HBase carries all the features of the original Google Bigtable paper like the Bloom filters, in-memory operations and compression. The tables of this database can serve as the input for MapReduce jobs on the Hadoop ecosystem and it can also serve as output after the data is processed by MapReduce. The data can be accessed via the Java API or through the REST API or even the Thrift and AVRO gateways.
What HBase is that it is basically a column-oriented key-value data store, and the since it works extremely fine with the kind of data that Hadoop process it is natural fit for deploying as a top layer on HDFS. It is extremely fast when it comes to both read and write operations and does not lose this extremely important quality even when the datasets are humongous. Therefore it is being widely used by corporations for its high throughput and low input/output latency. It cannot work as a replacement for the SQL database but it is perfectly possible to have an SQL layer on top of HBase to integrate it with the various business intelligence and analytics tools.
HBase is one of the core components of the Hadoop ecosystem along with the other two being HDFS and MapReduce. As part of the Hortonworks Data Platform the Apache Hadoop ecosystem is available as a highly secure, enterprise ready big data framework. It is being regularly deployed by some of the biggest companies like Facebook messaging system and so on. Some of the salient features of HBase that makes it one of the most sought after message storing system is as follows: