Configuration Settings

Alluxio can be configured by setting the values of supported configuration properties . To learn about how users can customize how an application (e.g., a Spark or MapReduce job) interacts with Alluxio, see how to configure Alluxio applications; to learn about how Alluxio admins can customize Alluxio service, see how to configure Alluxio clusters.

Configure Applications

Customizing how an application job interacts with Alluxio service is application specific. Here we provide recommendations for a few common applications.

Alluxio Shell Commands

Alluxio shell users can put JVM system properties -Dproperty=value after fs command and before the subcommand (e.g., copyFromLocal) to specify Alluxio properties from the command line. For example, the following Alluxio shell command sets the write type to CACHE_THROUGH when copying files to Alluxio:

$ bin/alluxio fs -Dalluxio.user.file.writetype.default=CACHE_THROUGH copyFromLocal README.md /README.md

Spark Jobs

Spark users can use pass JVM system properties to Spark jobs by adding "-Dproperty=value" to spark.executor.extraJavaOptions for Spark executors and spark.driver.extraJavaOptions for Spark drivers. For example, to submit a Spark job with the write CACHE_THROUGH when writing to Alluxio:

$ spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...

In the Spark Shell, this can be achieved by:

val conf = new SparkConf()
    .set("spark.driver.extraJavaOptions", "-Dalluxio.user.file.writetype.default=CACHE_THROUGH")
    .set("spark.executor.extraJavaOptions", "-Dalluxio.user.file.writetype.default=CACHE_THROUGH")
val sc = new SparkContext(conf)

Hadoop MapReduce Jobs

Hadoop MapReduce users can add "-Dproperty=value" after the hadoop jar or yarn jar command and the properties will be propagated to all the tasks of this job. For example, the following MapReduce job of wordcount sets write type to CACHE_THROUGH when writing to Alluxio:

$ bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount \
-Dalluxio.user.file.writetype.default=CACHE_THROUGH \
-libjars /<PATH_TO_ALLUXIO>/client/alluxio-1.9.0-SNAPSHOT-client.jar \
<INPUT FILES> <OUTPUT DIRECTORY>

Configure Alluxio Cluster

Alluxio admins can create and customize the property file alluxio-site.properties to configure an Alluxio cluster. If this file does not exist, it can be created from the template file under ${ALLUXIO_HOME}/conf:

$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties

Make sure that this file is distributed to ${ALLUXIO_HOME}/conf on every Alluxio node (masters and workers) before starting the cluster.

Use Cluster Default

Since v1.8, each Alluxio client can initialize its configuration with the cluster-wide configuration values retrieved from masters. To be specific, when different client applications such as Alluxio Shell commands, Spark jobs, or MapReduce jobs connect to an Alluxio service, they will initialize their own Alluxio configuration properties with the default values supplied by the masters based on the master-side ${ALLUXIO_HOME}/conf/alluxio-site.properties files. As a result, cluster admins can put client-side settings (e.g., alluxio.user.*) or network transport settings (such as alluxio.security.authentication.type) in ${ALLUXIO_HOME}/conf/alluxio-site.properties on masters, which will be distributed and become cluster-wide default values for new Alluxio clients.

For example, a common Alluxio property alluxio.user.file.writetype.default is default to MUST_CACHE which only writes to Alluxio space. In an Alluxio cluster deployment where data persistency is preferred and all jobs need to write through to both UFS and Alluxio, with Alluxio v1.8 or later the admin can simply add alluxio.user.file.writetype.default=CACHE_THROUGH to the master-side ${ALLUXIO_HOME}/conf/alluxio-site.properties. After restarting the cluster, all the new jobs will automatically set property alluxio.user.file.writetype.default to CACHE_THROUGH as its default value.

Clients can still ignore or overwrite the cluster-wide default values, either specifying the property alluxio.user.conf.cluster.default.enabled=false to decline loading the cluster-wide default values or following the approaches described in Configure Alluxio for Applications to overwrite the same properties.

Note that, before v1.8, ${ALLUXIO_HOME}/conf/alluxio-site.properties file is only loaded by Alluxio server processes and will be ignored by applications interacting with Alluxio service through Alluxio client, unless ${ALLUXIO_HOME}/conf is on applications’ classpath.

Use Environment variables

Alluxio supports a few frequently used configuration settings via the environment variables, including:

Environment VariableDescription
ALLUXIO_CONF_DIR path to Alluxio configuration directory.
ALLUXIO_LOGS_DIR path to Alluxio logs directory.
ALLUXIO_MASTER_HOSTNAME hostname of Alluxio master, defaults to localhost.
ALLUXIO_UNDERFS_ADDRESS under storage system address, defaults to ${ALLUXIO_HOME}/underFSStorage which is a local file system.
ALLUXIO_RAM_FOLDER the directory where a worker stores in-memory data, defaults to /mnt/ramdisk.
ALLUXIO_JAVA_OPTS Java VM options for both Master, Worker and Alluxio Shell configuration. Note that, by default ALLUXIO_JAVA_OPTS is included in both ALLUXIO_MASTER_JAVA_OPTS, ALLUXIO_WORKER_JAVA_OPTS and ALLUXIO_USER_JAVA_OPTS.
ALLUXIO_MASTER_JAVA_OPTS additional Java VM options for Master configuration.
ALLUXIO_WORKER_JAVA_OPTS additional Java VM options for Worker configuration.
ALLUXIO_USER_JAVA_OPTS additional Java VM options for Alluxio shell configuration.
ALLUXIO_CLASSPATH additional classpath entries for Alluxio processes. This is empty by default.
ALLUXIO_LOGSERVER_HOSTNAME host name of the log server. This is empty by default.
ALLUXIO_LOGSERVER_PORT port number of the log server. This is 45600 by default.
ALLUXIO_LOGSERVER_LOGS_DIR path to the local directory where Alluxio log server stores logs received from Alluxio servers.

For example, if you would like to setup an Alluxio master at localhost that talks to an HDFS cluster with a namenode also running at localhost, and enable Java remote debugging at port 7001, you can do so before starting master process using:

$ export ALLUXIO_MASTER_HOSTNAME="localhost"
$ export ALLUXIO_UNDERFS_ADDRESS="hdfs://localhost:9000"
$ export ALLUXIO_MASTER_JAVA_OPTS="$ALLUXIO_JAVA_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=7001"

Users can either set these variables through the shell or in conf/alluxio-env.sh. If this file does not exist yet, you can create one by copying the template:

$ cp conf/alluxio-env.sh.template conf/alluxio-env.sh

Configuration Sources

An Alluxio property can be possibly configured in multiple sources. In this case, its final value is determined by the source earliest in this list:

  1. JVM system properties (i.e., -Dproperty=key)
  2. Environment variables
  3. Property files. When an Alluxio cluster starts, each server process including master and worker searches alluxio-site.properties in a list paths of ${HOME}/.alluxio/, /etc/alluxio/ and ${ALLUXIO_HOME}/conf in order, and will skip the remaining paths once this alluxio-site.properties file is found.
  4. Cluster default values. An Alluxio client may initialize its configuration based on the cluster-wide default configuration served by the masters.

If no above user-specified configuration is found for a property, Alluxio runtime will fallback to its default property value.

To check the value of a specific configuration property and the source of its value, users can use the following commandline:

$ bin/alluxio getConf alluxio.worker.port
29998
$ bin/alluxio getConf --source alluxio.worker.port
DEFAULT

To list all of the configuration properties with sources:

$ bin/alluxio getConf --source
alluxio.conf.dir=/Users/bob/alluxio/conf (SYSTEM_PROPERTY)
alluxio.debug=false (DEFAULT)
...

Users can also specify --master option to list all of the cluster-default configuration properties by the masters. Note that, with --master option getConf will query the master and thus require the master nodes running; without --master option this command only checks the local configuration.

$ bin/alluxio getConf --master --source
alluxio.conf.dir=/Users/bob/alluxio/conf (SYSTEM_PROPERTY)
alluxio.debug=false (DEFAULT)
...

Server Configuration Checker

Server-side configuration checker helps discover configuration errors and warnings. Suspected configuration errors are reported through the web UI, doctor CLI, and master logs.

The web UI shows the result of the server configuration check.

webUi

Users can also run the fsadmin doctor command to get the same results.

$ bin/alluxio fsadmin doctor configuration

Configuration warnings can also be seen in the master logs.

masterLogs

Need help? Ask a Question