Remote Data Acceleration

The primary appeal of a coupled compute-storage architecture is the performance possible by bringing the compute engine close to the data it requires. However, the costs of maintaining such a tight-knit architecture are gradually overtaking the performance benefits. Especially with the popularity of cloud resources, being able to independently scale compute and storage means large cost savings and cheaper maintenance costs. The reversal of this paradigm puts many data platforms in a tough position, forced to trade off between performance and cost. Alluxio solves this dilemma by providing the same performance of a coupled compute-storage architecture in a decoupled architecture.


Alluxio accelerates access to big data
data-acceleration.png

Alluxio achieves this by providing a near-client cache when Alluxio is deployed with or alongside compute nodes. Applications and compute frameworks send requests through Alluxio, which in turn fetches data from remote storage. Along the way, Alluxio maintains a cached copy of the data in Alluxio storage, be it in memory or durable media available on the Alluxio nodes. Future requests are automatically served through the cached copy. This enables coupled compute-storage architecture performance. However, the key difference is Alluxio doesn't need to hold all data; it only needs to hold the working set. Therefore, Alluxio does not need a significant amount of storage and can function on a limited storage size, regardless of the total data size. When the working set becomes sufficiently large, Alluxio will provide incremental benefits based on the amount of storage it has available.

Caching the working set is not a game changing innovation in and of itself. However, coupled with the flexibility of Alluxio’s unified namespace, the two features together make Alluxio the system to use for data access.

Need help? Ask a Question