Quick Start Guide

This quick start guide goes over how to run Alluxio on a local machine. The guide will cover the following tasks:

  • Download and configure Alluxio
  • Validating Alluxio environment
  • Start Alluxio locally
  • Perform basic tasks via Alluxio Shell
  • [Bonus] Mount a public Amazon S3 bucket in Alluxio
  • Stop Alluxio

[Bonus] This guide contains optional tasks that uses credentials from an AWS account with an access key id and secret access key. The optional sections will be labeled with [Bonus].

Note This guide is designed to start an Alluxio system with minimal setup. Alluxio performs best in a distributed environment for big data workloads, but this scenario is difficult to simulate on a single machine. The performance benefits of Alluxio are illustrated in the following whitepapers, which include further instructions for running Alluxio in a scaled-up environment:

Prerequisites

Setup SSH (Mac OS X only)

For Mac OS X, enable remote login to SSH into localhost. The setting is found in System Preferences, under Sharing. Check that Remote Login is enabled.

Downloading Alluxio

Download Alluxio from this page. Select the 1.8.1 release followed by the distribution built for default Hadoop. Unpack the downloaded file with the following commands.

$ tar -xzf alluxio-1.8.1-bin.tar.gz
$ cd alluxio-1.8.1

This creates a directory alluxio-1.8.1 with all of the Alluxio source files and Java binaries. Through this tutorial, the path of this directory will be referred to as ${ALLUXIO_HOME}.

Configuring Alluxio

In the ${ALLUXIO_HOME}/conf directory, create the conf/alluxio-site.properties configuration file by copying the template file.

$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties

Set alluxio.master.hostname in conf/alluxio-site.properties to localhost.

$ echo "alluxio.master.hostname=localhost" >> conf/alluxio-site.properties

[Bonus] Configuration for AWS

To configure Alluxio to interact with Amazon S3, add AWS access information to the Alluxio configuration in conf/alluxio-site.properties. The following commands update the configuration.

$ echo "aws.accessKeyId=<AWS_ACCESS_KEY_ID>" >> conf/alluxio-site.properties
$ echo "aws.secretKey=<AWS_SECRET_ACCESS_KEY>" >> conf/alluxio-site.properties

Replace <AWS_ACCESS_KEY_ID> and <AWS_SECRET_ACCESS_KEY> with a valid AWS access key ID and AWS secret access key respectively.

Validating Alluxio environment

Alluxio provides commands to ensure the system environment is ready for running Alluxio services. Run the following command to validate the environment for running Alluxio locally:

$ ./bin/alluxio validateEnv local

This reports potential problems that might prevent Alluxio from starting locally.

Check out this page for detailed usage information regarding the validateEnv command.

Starting Alluxio

Alluxio needs to be formatted before starting the process. The following command formats the Alluxio journal and worker storage directories.

$ ./bin/alluxio format

By default, Alluxio is configured to start a master and worker process when running locally. Start Alluxio on localhost with the following command:

$ ./bin/alluxio-start.sh local SudoMount

Congratulations! Alluxio is now up and running! Visit http://localhost:19999 and http://localhost:30000 to see the status of the Alluxio master and worker respectively.

Using the Alluxio Shell

The Alluxio shell provides command line operations for interacting with Alluxio. To see a list of filesystem operations, run

$ ./bin/alluxio fs

List files in Alluxio with the ls command. To list all files in the root directory, use the following command:

$ ./bin/alluxio fs ls /

At this moment, there are no files in Alluxio. Copy a file into Alluxio by using the copyFromLocal shell command.

$ ./bin/alluxio fs copyFromLocal LICENSE /LICENSE
Copied LICENSE to /LICENSE

List the files in Alluxio again to see the LICENSE file.

$ ./bin/alluxio fs ls /
-rw-r--r-- staff  staff     26847 NOT_PERSISTED 01-09-2018 15:24:37:088 100% /LICENSE

The output shows the file that exists in Alluxio, as well the size of the file, the date it was created, the owner and group of the file, and the percentage of the file that is cached in Alluxio.

The cat command prints the contents of the file.

$ ./bin/alluxio fs cat /LICENSE
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
...

With the default configuration, Alluxio uses the local file system as its under file storage (UFS). The default path for the UFS is ./underFSStorage. Examine the contents of the UFS with:

$ ls ./underFSStorage/

Note that the directory does not exist. This is because Alluxio is currently writing data only into Alluxio space, not to the UFS.

Configure Alluxio to persist the file from Alluxio space to the UFS by using the persist command.

$ ./bin/alluxio fs persist /LICENSE
persisted file /LICENSE with size 26847

The file should appear when examining the UFS path again.

$ ls ./underFSStorage
LICENSE

The LICENSE file also appears in the Alluxio file system through the master’s web UI. Here, the Persistence State column shows the file as PERSISTED.

[Bonus] Mounting in Alluxio

Alluxio unifies access to storage systems with the unified namespace feature. Read the Unified Namespace blog post and the unified namespace documentation for more detailed explanations of the feature.

This feature allows users to mount different storage systems into the Alluxio namespace and access the files across various storage systems through the Alluxio namespace seamlessly.

Create a directory in Alluxio to store our mount points.

$ ./bin/alluxio fs mkdir /mnt
Successfully created directory /mnt

Mount an existing S3 bucket to Alluxio. This guide uses the alluxio-quick-start S3 bucket.

$ ./bin/alluxio fs mount --readonly alluxio://localhost:19998/mnt/s3 s3a://alluxio-quick-start/data
Mounted s3a://alluxio-quick-start/data at alluxio://localhost:19998/mnt/s3

List the files mounted from S3 through the Alluxio namespace by using the ls command.

$ ./bin/alluxio fs ls /mnt/s3
-r-x------ staff  staff    955610 PERSISTED 01-09-2018 16:35:00:882   0% /mnt/s3/sample_tweets_1m.csv
-r-x------ staff  staff  10077271 PERSISTED 01-09-2018 16:35:00:910   0% /mnt/s3/sample_tweets_10m.csv
-r-x------ staff  staff     89964 PERSISTED 01-09-2018 16:35:00:972   0% /mnt/s3/sample_tweets_100k.csv
-r-x------ staff  staff 157046046 PERSISTED 01-09-2018 16:35:01:002   0% /mnt/s3/sample_tweets_150m.csv

The newly mounted files and directories are also visible in the Alluxio web UI.

With Alluxio’s unified namespace, users can interact with data from different storage systems seamlessly. The ls -R command recursively lists all the files that exist under a directory.

$ ./bin/alluxio fs ls -R /
-rw-r--r-- staff  staff     26847 PERSISTED 01-09-2018 15:24:37:088 100% /LICENSE
drwxr-xr-x staff  staff         1 PERSISTED 01-09-2018 16:05:59:547  DIR /mnt
dr-x------ staff  staff         4 PERSISTED 01-09-2018 16:34:55:362  DIR /mnt/s3
-r-x------ staff  staff    955610 PERSISTED 01-09-2018 16:35:00:882   0% /mnt/s3/sample_tweets_1m.csv
-r-x------ staff  staff  10077271 PERSISTED 01-09-2018 16:35:00:910   0% /mnt/s3/sample_tweets_10m.csv
-r-x------ staff  staff     89964 PERSISTED 01-09-2018 16:35:00:972   0% /mnt/s3/sample_tweets_100k.csv
-r-x------ staff  staff 157046046 PERSISTED 01-09-2018 16:35:01:002   0% /mnt/s3/sample_tweets_150m.csv

This shows all the files across all of the mounted storage systems. The /LICENSE file is from the local file system whereas the files under /mnt/s3/ are in S3.

[Bonus] Accelerating Data Access with Alluxio

Since Alluxio leverages memory to store data, it can accelerate access to data. Check the status of a file previously mounted from S3 into Alluxio:

$ ./bin/alluxio fs ls /mnt/s3/sample_tweets_150m.csv
-r-x------ staff  staff 157046046 PERSISTED 01-09-2018 16:35:01:002   0% /mnt/s3/sample_tweets_150m.csv

The output shows that the file is Not In Memory. This file is a sample of tweets. Count the number of tweets with the word “kitten” and time the duration of the operation.

$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c kitten
889

real	0m22.857s
user	0m7.557s
sys	0m1.181s

Depending on your network connection, the operation may take over 20 seconds. If reading this file takes too long, use a smaller dataset. The other files in the directory are smaller subsets of this file. Alluxio can accelerate access to this data by using memory to store the data.

After reading the file by the cat command, check the status with the ls command:

$ ./bin/alluxio fs ls /mnt/s3/sample_tweets_150m.csv
-r-x------ staff  staff 157046046 PERSISTED 01-09-2018 16:35:01:002 100% /mnt/s3/sample_tweets_150m.csv

The output shows that the file is now 100% loaded to Alluxio, so reading the file should be significantly faster.

Now count the number of tweets with the word “puppy”.

$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c puppy
1553

real	0m1.917s
user	0m2.306s
sys	0m0.243s

Subsequent reads of the same file are noticeably faster since the data is stored in Alluxio memory.

Now count how many tweets mention the word “bunny”.

$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c bunny
907

real	0m1.983s
user	0m2.362s
sys	0m0.240s

Congratulations! You installed Alluxio locally and used Alluxio to accelerate access to data!

Stopping Alluxio

Stop Alluxio with the following command:

$ ./bin/alluxio-stop.sh local

Conclusion

Congratulations on completing the quick start guide for Alluxio! This guide covered how to download and install Alluxio locally with examples of basic interactions via the Alluxio shell. This was a simple example on how to get started with Alluxio.

There are several next steps available. Learn more about the various features of Alluxio in our documentation. The resources below detail deploying Alluxio in various ways, mounting existing storage systems, and configuring existing applications to interact with Alluxio.

Deploying Alluxio

Alluxio can be deployed in many different environments.

Under Storage Systems

Various under storage systems can be accessed through Alluxio.

Frameworks and Applications

Different frameworks and applications work with Alluxio.

Need help? Ask a Question