Running Apache HBase on Alluxio

This guide describes how to run Apache HBase, so that you can easily store HBase tables into Alluxio at various storage level.

Prerequisites

The prerequisite for this part is that you have Java. Alluxio cluster should also be set up in accordance to these guides for either Local Mode or Cluster Mode.

Please follow the guides for setting up HBase on Apache HBase Configuration.

Configuration

Apache HBase allows you to use Alluxio through a generic file system wrapper for the Hadoop file system. Therefore, the configuration of Alluxio is done mostly in HBase configuration files.

Set property in hbase-site.xml

You need to add the following three properties to hbase-site.xml in your HBase installation conf directory (make sure these properties are configured in all HBase cluster nodes):

Tips:You do not need to create the /hbase directory in Alluxio, HBase will do this for you.

<property>
  <name>fs.alluxio.impl</name>
  <value>alluxio.hadoop.FileSystem</value>
</property>
<property>
  <name>fs.AbstractFileSystem.alluxio.impl</name>
  <value>alluxio.hadoop.AlluxioFileSystem</value>
</property>
<property>
  <name>hbase.rootdir</name>
  <value>alluxio://<hostname>:<port>/hbase</value>
</property>

Distribute the Alluxio Client jar

We need to make the Alluxio client jar file available to HBase, because it contains the configured alluxio.hadoop.FileSystem class.

There are two ways to achieve that:

  • Put the alluxio-core-client-1.4.0-jar-with-dependencies.jar file into the lib directory of HBase.
  • Specify the location of the jar file in the $HBASE_CLASSPATH environment variable (make sure it’s available on all cluster nodes). For example:
export HBASE_CLASSPATH=/<PATH_TO_ALLUXIO>/core/client/target/alluxio-core-client-1.5.0-SNAPSHOT-jar-with-dependencies.jar:${HBASE_CLASSPATH}

Add additional Alluxio site properties to HBase

If there are any Alluxio site properties you want to specify for HBase, add those to hbase-site.xml. For example, change alluxio.user.file.writetype.default from default MUST_CACHE to CACHE_THROUGH:

<property>
<name>alluxio.user.file.writetype.default</name>
<value>CACHE_THROUGH</value>
</property>

Using Alluxio with HBase

Start HBase:

$ ${HBASE_HOME}/bin/start-hbase.sh

Visit HBase Web UI at http://<hostname>:16010 to confirm that HBase is running on Alluxio (check the HBase Root Directory attribute):

HBaseRootDirectory

And visit Alluxio Web UI at http://<hostname>:19999, click Browse and you can see the files HBase stores on Alluxio, including data and WALs:

HBaseRootDirectoryOnAlluxio

HBase shell examples

Create a text file simple_test.txt and write these commands into it:

create 'test', 'cf'
for i in Array(0..9999)
 put 'test', 'row'+i.to_s , 'cf:a', 'value'+i.to_s
end
list 'test'
scan 'test', {LIMIT => 10, STARTROW => 'row1'}
get 'test', 'row1'

Run the following command from the top level HBase project directory:

bin/hbase shell simple_test.txt

You should see some output like this:

HBaseShellOutput

If you have Hadoop installed, you can run a Hadoop-utility program in HBase shell to count the rows of the newly created table:

bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter test

After this mapreduce job finishes, you can see a result like this:

HBaseHadoopOutput

Need help? Ask a Question