Running Presto with Alluxio

This guide describes how to run Presto with Alluxio, so that you can easily use Presto to query Hive tables stored in Alluxio’s tiered storage.

Prerequisites

The prerequisite for this part is that you have Java. And the Java version must use Java 8 Update 60 or higher (8u60+), 64-bit. Alluxio cluster should also be set up in accordance to these guides for either Local Mode or Cluster Mode.

Please Download Presto(This doc uses presto-0.170). Also, please complete Hive setup using Hive On Alluxio

Configuration

Presto gets the database and table metadata information from Hive Metastore. At the same time, the file system location of table data is obtained from the table’s metadata entries. So you need to configure Presto on HDFS. In order to access HDFS, you need to add the Hadoop conf files (core-site.xml,hdfs-site.xml), and use hive.config.resources in file /<PATH_TO_PRESTO>/etc/catalog/hive.properties to point to the file’s location for every Presto worker.

Configure core-site.xml

You need to add the following configuration items to the core-site.xml configured in hive.properties:

<property>
  <name>fs.alluxio.impl</name>
  <value>alluxio.hadoop.FileSystem</value>
</property>
<property>
  <name>fs.AbstractFileSystem.alluxio.impl</name>
  <value>alluxio.hadoop.AlluxioFileSystem</value>
  <description>The Alluxio AbstractFileSystem (Hadoop 2.x)</description>
</property>

To use fault tolerant mode, set the Alluxio cluster properties appropriately in an alluxio-site.properties file which is on the classpath.

alluxio.zookeeper.enabled=true
alluxio.zookeeper.address=[zookeeper_hostname]:2181

Alternatively you can add the properties to the Hadoop core-site.xml configuration which is then propagated to Alluxio.

<configuration>
  <property>
    <name>alluxio.zookeeper.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>alluxio.zookeeper.address</name>
    <value>[zookeeper_hostname]:2181</value>
  </property>
</configuration>

Configure additional Alluxio properties

Similar to above, add additional Alluxio properties to core-site.xml of Hadoop configuration in Hadoop directory on each node. For example, change alluxio.user.file.writetype.default from default MUST_CACHE to CACHE_THROUGH:

<property>
  <name>alluxio.user.file.writetype.default</name>
  <value>CACHE_THROUGH</value>
</property>

Alternatively, you can also append the path to alluxio-site.properties to Presto’s JVM config at etc/jvm.config under Presto folder. The advantage of this approach is to have all the Alluxio properties set within the same file of alluxio-site.properties.

...
-Xbootclasspath/p:<path-to-alluxio-site-properties>

Also, it’s recommended to increase alluxio.user.network.netty.timeout.ms to a bigger value (e.g. 10 mins) to avoid the timeout failure when reading large files from remote worker.

Increase hive.max-split-size

Presto’s Hive integration uses the config hive.max-split-size to control the parallelism of the query. It’s recommended to set this size no less than Alluxio’s block size to avoid the read contention within the same block.

Distribute the Alluxio Client Jar

We recommend you to download the tarball from Alluxio download page. Alternatively, advanced users can choose to compile this client jar from the source code by following Follow the instructs here. The Alluxio client jar can be found at /<PATH_TO_ALLUXIO>/client/alluxio-1.7.0-SNAPSHOT-client.jar.

Distribute the Alluxio client jar to all worker nodes in Presto:

  • You must put Alluxio client jar /<PATH_TO_ALLUXIO>/client/alluxio-1.7.0-SNAPSHOT-client.jar into Presto cluster’s worker directory $PRESTO_HOME/plugin/hive-hadoop2/ (For different versions of Hadoop, put the appropriate folder), And restart the process of coordinator and worker.

Presto cli examples

Create a table in Hive and load a file in local path into Hive:

You can download the data file from http://grouplens.org/datasets/movielens/

hive> CREATE TABLE u_user (
userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;

hive> LOAD DATA LOCAL INPATH '<path_to_ml-100k>/u.user'
OVERWRITE INTO TABLE u_user;

View Alluxio WebUI at http://master_hostname:19999 and you can see the directory and file Hive creates:

HiveTableInAlluxio

Alternatively, you can follow the instructions to create the tables from existing files in Alluxio.

Next, using a single query:

/home/path/presto/presto-cli-0.170-executable.jar --server masterIp:prestoPort --execute "use default;select * from u_user limit 10;" --user username --debug

And you can see the query results from console:

PrestoQueryResult

Presto Server log:

PrestoQueryLog

Need help? Ask a Question