S3 Client

Alluxio supports a RESTFul API that is compatible with the basic operations of the Amazon S3 API.

The REST API documentation is generated as part of Alluxio build and accessible through ${ALLUXIO_HOME}/core/server/proxy/target/miredot/index.html.

There are performance implications of using the HTTP proxy. In particular, using the proxy requires an extra hop. For optimal performance, it is recommended to run the proxy server and an Alluxio worker on each compute node. It is also recommended to put all the proxy servers behind a load balancer.

Features support

The following table describes the support status for current Amazon S3 functional features:

S3 FeatureStatus
List Buckets Supported
Delete Buckets Supported
Create Bucket Supported
Bucket Lifecycle Not Supported
Policy (Buckets, Objects) Not Supported
Bucket ACLs (Get, Put) Not Supported
Bucket Location Not Supported
Bucket Notification Not Supported
Bucket Object Versions Not Supported
Get Bucket Info (HEAD) Not Supported
Put Object Supported
Delete Object Supported
Get Object Supported
Get Object Info (HEAD) Supported
Object ACLs (Get, Put) Not Supported
POST Object Not Supported
Copy Object Not Supported
Multipart Uploads Not Supported

Language support

Alluxio S3 client supports various programming languages, such as C++, Java, Python, Golang, Ruby and etc. In this documentation, we use curl REST calls and python S3 client as usage examples.

Example Usage

REST API

For example, you can run the following RESTFul API calls to an Alluxio cluster running on localhost. The Alluxio proxy is listening at port 39999 by default.

Create a bucket

# curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:34:41 GMT
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)

Get the bucket (listing objects)

# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:35:00 GMT
Content-Type: application/xml
Content-Length: 200
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>

Put an object

Assume there is an existing file on local file system called LICENSE.

# curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobject
HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:36:03 GMT
ETag: "9347237b67b0be183499e5893128704e"
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)

Get the object:

# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:37:34 GMT
Last-Modified: Tue, 29 Aug 2017 22:36:03 GMT
Content-Type: application/xml
Content-Length: 26847
Server: Jetty(9.2.z-SNAPSHOT)

.................. Content of the test file ...................

Listing a bucket with one object

# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:38:48 GMT
Content-Type: application/xml
Content-Length: 363
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

Listing a bucket with multiple objects

You can upload more files and use the max-keys and continuation-token as the GET bucket request param. For example:

# curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key1
# curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2
# curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3
# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:40:45 GMT
Content-Type: application/xml
Content-Length: 537
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2017-08-29T15:40:42.213Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2017-08-29T15:40:43.269Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:41:18 GMT
Content-Type: application/xml
Content-Length: 540
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2017-08-29T15:40:44.002Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

You can also verify those objects are represented as Alluxio files, under /testbucket directory.

./bin/alluxio fs ls -R /testbucket

Delete objects

# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1
HTTP/1.1 204 No Content
Date: Tue, 29 Aug 2017 22:43:22 GMT
Server: Jetty(9.2.z-SNAPSHOT)
# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2
# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3
# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject

Initiate a multipart upload

# curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploads
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 197
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?>
<InitiateMultipartUploadResult xmlns="">
  <Bucket>testbucket</Bucket>
  <Key>testobject</Key>
  <UploadId>2</UploadId>
</InitiateMultipartUploadResult>

Upload part

# curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
ETag: "b54357faf0632cce46e942fa68356b38"
Server: Jetty(9.2.z-SNAPSHOT)

List parts

# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 985
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?>
<ListPartsResult xmlns="">
  <Bucket>testbucket</Bucket>
  <Key>testobject</Key>
  <UploadId>2</UploadId>
  <StorageClass>STANDARD</StorageClass>
  <IsTruncated>false</IsTruncated>
  <Part>
    <PartNumber>1</PartNumber>
    <LastModified>2017-08-29T20:48:34.000Z</LastModified>
    <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
    <Size>10485760</Size>
  </Part>
</ListPartsResult>

Complete a multipart upload

# curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2 -d '
<CompleteMultipartUpload>
  <Part>
    <PartNumber>1</PartNumber>
    <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  </Part>
</CompleteMultipartUpload>'

HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?>
<CompleteMultipartUploadResult xmlns="">
  <Location>/testbucket/testobjectLocation>
  <Bucket>testbucket</Bucket>
  <Key>testobject</Key>
  <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
</CompleteMultipartUploadResult>

Abort a multipart upload

# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
HTTP/1.1 204 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)

Delete an empty bucket

# curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 204 No Content
Date: Tue, 29 Aug 2017 22:45:19 GMT

Python S3 Client

Create a connection:

import boto
import boto.s3.connection

conn = boto.connect_s3(
    aws_access_key_id = '',
    aws_secret_access_key = '',
    host = 'localhost',
    port = 39999,
    path = '/api/v1/s3',
    is_secure=False,
    calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)

Create a bucket

bucketName = 'bucket-for-testing'
bucket = conn.create_bucket(bucketName)

PUT a small object

smallObjectKey = 'small.txt'
smallObjectContent = 'Hello World!'

key = bucket.new_key(smallObjectKey)
key.set_contents_from_string(smallObjectContent)

Get the small object

assert smallObjectContent == key.get_contents_as_string()

Upload a large object

Create a 8MB file on local file system.

# dd if=/dev/zero of=8mb.data bs=1048576 count=8

Then use python S3 client to upload this as an object

largeObjectKey = 'large.txt'
largeObjectFile = '8mb.data'

key = bucket.new_key(largeObjectKey)
with open(largeObjectFile, 'rb') as f:
    key.set_contents_from_file(f)
with open(largeObjectFile, 'rb') as f:
    largeObject = f.read()

Get the large object

assert largeObject == key.get_contents_as_string()

Delete the objects

bucket.delete_key(smallObjectKey)
bucket.delete_key(largeObjectKey)

Initiate a multipart upload

mp = b.initiate_multipart_upload(largeObjectFile)

Upload parts

import math, os

from filechunkio import FileChunkIO

# Use a chunk size of 1MB (feel free to change this)
sourceSize = os.stat(largeObjectFile).st_size
chunkSize = 1048576
chunkCount = int(math.ceil(sourceSize / float(chunkSize)))

for i in range(chunk_count):
    offset = chunk_size * i
    bytes = min(chunk_size, source_size - offset)
    with FileChunkIO(source_path, 'r', offset=offset, bytes=bytes) as fp:
        mp.upload_part_from_file(fp, part_num=i + 1)

Complete the multipart upload

mp.complete_upload()

Abort the multipart upload

mp.cancel_upload()

Delete the bucket

conn.delete_bucket(bucketName)
Need help? Ask a Question