Spark Summit SF · Jun 6, 2017 by Gene Pang & Cheng Chang
Alluxio's unified namespace provides applications the same file system APIs to access your data in any storage like SANs, distributed file systems or object stores.
Alluxio's memory centric architecture enables applications to have memory speed I/O.
Efficient Data Sharing
Alluxio enables easy and efficient sharing of data across frameworks and applications at memory speed.
Alluxio supports various existing frameworks, such as Spark, MapReduce, Flink, and its filesystem API seamlessly works with future frameworks and applications.
Alluxio tiered storage architecture makes our users efficiently leverage storage media other than memory, such as SSD, HDD, and others.
Alluxio runs well in cloud, on-premise, and hybrid cloud environments.
Customizable Data Management
Alluxio provides customizable and pluggable policies for defining data management behavior that best fits your workloads and needs
Alluxio provides Hadoop compatible file system interface. Any Hadoop applications can run on Alluxio without any code change.
Industry deployments demonstrates scalability to thousands of nodes.
Between Computation Frameworks and Storage Systems
Alluxio users can work with frameworks of their choice. Additionally, Alluxio enables new workloads across different storage systems.
Various frameworks can share data efficiently among each other. Alluxio's unified namespace enables applications to interact data in any storage system at memory speed. This future proven architecture enables users to extract value from data in any storage faster.
Alluxio has attracted contributors from over 100 institutions, including Alibaba, Alluxio, Baidu, CMU, Google, IBM, Intel, NJU, Red Hat, UC Berkeley and Yahoo. The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the Fedora distribution. If you'd like to join the community, or contribute to Alluxio, learn how to contribute.
Number of Contributors to Alluxio
More Than 20,000 Contributions
More Than 2,800 Stars
Director of the AMPLab at UC Berkeley
Alluxio is the next project with roots in the AMPLab to have major impact. We see it playing a huge disruptive role in the evolution of the storage layer to handle the expanding range of big data use cases.
CTO and Senior Research Fellow of Alibaba Cloud, founder of Linux Virtual Server
As the cloud computing business for Alibaba Group, the world’s leading e-commerce business, Alibaba manages many of the world’s largest data centers, including the largest big data cluster ever built in China. With Alluxio combined with AliCloud OSS as well as other AliCloud cloud service products, our customers can leverage the technology trends of hardware to run important jobs at the fastest performance. We have been contributing to the Alluxio open source community and believe that Alluxio will play a critical role in the future of big data infrastructure.
Professor at UC Berkeley, co-author of Spark, co-founder and executive chairman of DataBricks, co-director of UC Berkeley AMPLab
As a layer that abstracts away the differences of existing storage systems from the cluster computing frameworks such as Apache Spark and Hadoop MapReduce, Alluxio can enable the rapid evolution of the big data storage, similarly to the way the Internet Protocol (IP) has enabled the evolution of the Internet.
Intel Vice President
Big data analytics is driving new requirements for distributed memory across clusters for real-time streaming, interactive queries, analytics and graph processing. We are excited to work with developer communities on Alluxio and to optimize Alluxio solutions on Intel platforms. Ultimately, this helps our customers create more innovative and high performance cloud and big data solutions.
Baidu Chief Architect
As one of the largest Internet companies in the world, Baidu constantly faces the challenges of managing data at multi-petabyte scale. By adopting innovative technologies like Alluxio we are able to help our users extract meaningful and useful data almost instantly. Our deployment of an Alluxio cluster has already reached 1,000 workers, which is one of the largest Alluxio clusters in the world. The tiered storage of Alluxio has provided us great flexibility in managing data in large-scale. We are seeing an average 10-fold, and up to 30-fold performance improvement in supporting interactive query system and other types of workloads. This greatly improved the speed in making important business decisions.