Running Alluxio on GCE

Alluxio can be deployed on Google Compute Engine (GCE) using the Vagrant scripts that come with Alluxio. The scripts let you create, configure, and destroy clusters.

Prerequisites

Install Vagrant and the Google plugin

Download Vagrant

Install Google Vagrant plugin:

$ vagrant plugin install vagrant-google
$ vagrant box add google https://github.com/mitchellh/vagrant-google/raw/master/google.box

Install Alluxio

Download Alluxio to your local machine, and unzip it:

$ wget http://alluxio.org/downloads/files/1.4.0/alluxio-1.4.0-bin.tar.gz
$ tar xvfz alluxio-1.4.0-bin.tar.gz

Install python library dependencies

Install python>=2.7, not python3.

Under deploy/vagrant directory in your home directory, run:

$ sudo bash bin/install.sh

Alternatively, you can manually install pip, and then in deploy/vagrant run:

$ sudo pip install -r pip-req.txt

Launch a Cluster

To run an Alluxio cluster on GCE, you need a Google Cloud billing account, project, service account and JSON keys for the service account.

If you are new to Google Cloud, create a billing account and project at the free trial signup page. Also, If you are not familiar with Google Compute Engine, you may want to review the documentation first.

Next, new and existing Google Cloud users need to choose or create a service account within the Console on the Permissions page, under the Service Accounts tab. If creating a new service account, check “Furnish a new private key.” from the account creation dialog box. Download and store the JSON key in a safe location. If reusing a service account, you’ll need to have saved JSON keys for the account or download new keys. To download keys for an existing service account, while still in the Service Accounts tab, find the menu for the account under the 3 dots at the right of the service account list and select “create key.” Save the JSON key in a safe location.

Using the gcloud sdk configure keys for ssh:

$ curl https://sdk.cloud.google.com | bash
$ exec -l $SHELL
$ gcloud init
$ gcloud compute config-ssh

Copy deploy/vagrant/conf/gce.yml.template to deploy/vagrant/conf/gce.yml by:

$ cp deploy/vagrant/conf/gce.yml.template deploy/vagrant/conf/gce.yml

In the configuration file deploy/vagrant/conf/gce.yml, set the project id, service account, location to JSON key and ssh username you’ve just created.

For GCE, the default underfs is Google Cloud Storage (GCS). You need to sign into your Google Cloud console, create a GCS bucket and write the bucket’s name to the field GCS:Bucket in conf/ufs.yml. To use other under storage systems, configure the field Type and the corresponding configurations in conf/ufs.yml.

To access GCS with access keys, you need to create developer keys in GCS console Interoperability setting and set shell environment variables GCS_ACCESS_KEY_ID and GCS_SECRET_ACCESS_KEY by:

$ export GCS_ACCESS_KEY_ID=<your access key>
$ export GCS_SECRET_ACCESS_KEY=<your secret access key>

Now you can launch the Alluxio cluster by running the script under deploy/vagrant:

$ ./create <number of machines> google

Each node of the cluster runs an Alluxio worker, and the AlluxioMaster runs the Alluxio master.

Access the cluster

Access through Web UI

After the command ./create <number of machines> google succeeds, you can see two green lines like below shown at the end of the shell output:

>>> AlluxioMaster public IP is xxx, visit xxx:19999 for Alluxio web UI<<<
>>> visit default port of the web UI of what you deployed <<<

Default port for Alluxio Web UI is 19999.

Before you can access the Web UI, a network firewall rule needs to be made to allow tcp traffic on port 19999. This can be done through the Console UI or using a gcloud command like the following, which assumes a network named ‘default’.

$ gcloud compute firewall-rules create alluxio-ui --allow tcp:19999

Visit http://{MASTER_IP}:{PORT} in the browser to access the Web UIs.

You can also monitor the instances state through Google Cloud console.

Here are some scenarios when you may want to check the console: - When the cluster creation fails, check GCE instances status/logs. - After the cluster is destroyed, confirm GCE instances are terminated. - When you no longer need the cluster, make sure GCE instances are NOT costing you extra money.

Access with ssh

The nodes set up are named to AlluxioMaster, AlluxioWorker1, AlluxioWorker2 and so on.

To ssh into a node, run:

$ vagrant ssh <node name>

For example, you can ssh into AlluxioMaster with:

$ vagrant ssh AlluxioMaster

All software is installed under the root directory, e.g. Alluxio is installed in /alluxio.

On the AlluxioMaster node, you can run tests against Alluxio to check its health:

$ /alluxio/bin/alluxio runTests

After the tests finish, visit Alluxio web UI at http://{MASTER_IP}:19999 again. Click Browse File System in the navigation bar, and you should see the files written to Alluxio by the above tests.

From a node in the cluster, you can ssh to other nodes in the cluster without password with:

$ ssh AlluxioWorker1

Destroy the cluster

Under deploy/vagrant directory, you can run:

$ ./destroy

to destroy the cluster that you created. Only one cluster can be created at a time. After the command succeeds, the GCE instances are terminated.

Need help? Ask a Question