Data Flow in Alluxio
This page describes the behavior of common read and write scenarios in Alluxio and assumes a typical Alluxio setup as recommended in the architecture document; one where Alluxio collocates with the computation framework and applications and the under storage is either a remote storage cluster or cloud-based storage.
Positioned between the under storage and computation framework, Alluxio can serve as a caching layer for data reads. This section introduces different caching scenarios and their implications on performance.
Local Cache Hit
This occurs when the requested data resides on the local Alluxio worker and the computation gets a local cache hit. For example, if an application requests data access through the Alluxio client, the client checks with the Alluxio master for the worker location of the data. If the data is locally available, the Alluxio client will use a “short-circuit” read to bypass the Alluxio worker and read the file directly via the local filesystem. Short-circuit reads avoid data transfer over a TCP socket, and provide the data access at memory speed. Short-circuit reads are the most performant way of reading data out of Alluxio.
By default, short-circuit reads use local filesystem operations which require permissive permissions. This is sometimes impossible when the worker and client are dockerized due to incorrect resource accounting. In cases where the default short circuit is not feasible, Alluxio provides domain socket based short circuit in which the worker will transfer data to the client through a predesignated domain socket path. For more information on this topic, please check out the instructions on running Alluxio on Docker.
Also note that Alluxio can manage other storage media (e.g. SSD, HDD) in addition to memory, so local data access speed may vary depending on the local storage media. To learn more about this topic, please refer to the tiered storage document.
Local Cache Hit data flow
Remote Cache Hit
When the data is not available in a local Alluxio worker but resides in another Alluxio worker for the cluster, the Alluxio client will read from the remote worker. For example, if the Alluxio client checks with the Alluxio master and finds that the data is available on a remote Alluxio worker, the local worker will read the data from the remote worker and forward the data to the client.
In addition to returning the data to the client, the worker will also write a copy locally so that future reads of the same data can be served locally from memory. Remote cache hits provide network-speed data reads. Alluxio prioritizes reading from remote workers over reading from under storage because the network speed between Alluxio workers is typically faster than the speed between Alluxio workers and the under storage.
Remote Cache Hit data flow
If the data is not available within the Alluxio space, a cache miss occurs and the application will have to read the data from the under storage. The Alluxio client will delegate the read to a local worker and the worker will read the data from the under storage. The worker caches the data locally for future reads. Cache misses generally cause the largest delay because the application has to wait until the data is fetched from the under storage. A cache miss typically happens when the data is read the first time.
Cache Miss data flow
Partial caching is used when reading a block that is not locally available. The Alluxio client will read and cache the block entirely on the local worker even if only part of the block is requested by the client. When partial caching is off, the data will not be cached in Alluxio unless the entire block is read by the client. For certain workloads and file formats (e.g. ORC and Parquet), the data reads are not sequential, so without partial caching users may find the file is not cached into Alluxio. Partial caching is on by default, and you can turn it off by setting alluxio.user.file.cache.partially.read.block to false.
In Alluxio versions 1.7 and above, partial caching is handled on the server side, transparent to the client. The client will read data as normal and signal to the worker it read from to cache specific blocks. The worker will fetch these blocks from the under storage or other Alluxio workers asynchronously. Duplicate requests caused by concurrent readers will be deduplicated on the worker and result in caching the block once. Partial caching is not on the critical path, but may still impact performance if the network bandwidth between Alluxio and the under storage system is a bottleneck.
In Alluxio versions below 1.7, partial caching synchronously reads the entire block, requiring client read operations to wait until the block is fully cached in the local worker. The client delegates the read to the worker, and the worker reads from the beginning of the block and writes to the local RAM disk. The worker transfers data to the client for the range of the block that is requested, and blocks the client until the rest of the block is read and cached locally. With synchronous partial caching, user may observe longer delays since Alluxio reads more than is needed.
Partial Caching data flow
It’s possible to turn off caching in Alluxio and have the client read directly from the under storage by setting the property alluxio.user.file.writetype.default in the client to NO_CACHE.
Users can configure how data should be written by choosing from different write types. The write type can be set either through the Alluxio API or by configuring the property alluxio.user.file.writetype.default in the client. This section describes the behaviors of different write types as well as the performance implications to the applications.
With a write type of MUST_CACHE, the Alluxio client only writes to the local Alluxio worker, and no data will be written to the under storage. Before the write, Alluxio client will create the metadata on the Alluxio master. During the write, if short-circuit write is available, Alluxio client will directly write to the file on the local RAM disk, bypassing the Alluxio worker to avoid the slower network transfer. Short-circuit write is the most performant write (it executes at memory speed). Since the data is not written persistently to the under storage, data can be lost if the machine crashes or data needs to be freed up for newer writes. As a result, the MUST_CACHE setting is useful for writing temporary data when data loss can be tolerated.
MUST_CACHE data flow
With the write type of CACHE_THROUGH, data is written synchronously to an Alluxio worker and the under storage system. The Alluxio client delegates the write to the local worker, and the worker will simultaneously write to both the local memory as well as the under storage. Since the under storage is typically much slower to write to than the local storage, the client write speed will match the write speed of the under storage. The CACHE_THROUGH write type is recommended when data persistence is required. A local copy is also written, so any future reads of the data can be served from local memory directly.
CACHE_THROUGH data flow
Lastly, Alluxio provides an experimental write type of ASYNC_THROUGH. With ASYNC_THROUGH, data is written synchronously to an Alluxio worker and asynchronously to the under storage system. ASYNC_THROUGH can provide data write at memory speed while still persisting the data. However, as an experimental feature, ASYNC_THROUGH has some limitations: (1) The data can still be lost if the machine crashes before the asynchronous persisting to the under storage finishes and (2) all the blocks for a file must reside in the same worker.