Spring boot: Scale file storage with AWS S3

Jhamukul
6 min readFeb 11, 2023

How do we store files?

There are many ways to store the file.

  1. Store file in server.
Local file system

A server is a machine like your PC, Laptop so you can store files in the local file system.
It works perfectly if you have a single server and fewer requests to serve.

Let’s assume you are getting more requests and one server is insufficient to handle them. You may add one more server and load balancer to serve more requests.

Let’s assume you are uploading two images: image_1.jpg and image_2.jpg

Request 1 -> UploadImage[image_1.jpg] handled by server 2 and stored inside server 2 local storage.

Request 2-> UploadImage[image_2.jpg] handled by server 1 and stored inside server 1 local storage.

Those files are stored on different machines.

You made a request to download an image.
Request 3 -> downloadImage[image_2.jpg]
image_2.jpg stored inside server 1 local storage.
Request 3 may or may not land on server 1.

if the request lands on server 1 then it will return image_2.jpg
if the request lands on server 2, there is no image_2.jpg in server 2.

If any server goes down then

  1. You might lose the files.
  2. Data inconsistency will happen.

2. Store files in the database.

If we use a database, can store more and more files in the database.
File can’t be directly stored in the database. You have to store file content.
If any server goes down you do have files in the database.

Simple entity class

@Entity
public class ImageInfo{

@Id
private String id;

private String name;

@Lob
private byte[] content;

// getters and setters

}
  • The latency to read/write files in a database is always higher than in a file system.
  • A high volume of data could be stored in the database and it is a bit expensive.

3. Distributed File System

In a distributed file system, data is stored as files and directories in a file system that spans multiple servers, and the file system metadata is managed by a centralized component, such as a file server or a cluster of servers. Clients can access the file system over a network and access and modify the files as if they were stored locally.

Examples of distributed file systems include Network File Systems (NFS), Server Message Blocks (SMB), and the Hadoop Distributed File System (HDFS). These systems are commonly used in enterprise environments, cloud computing, and big data processing.

4. Distributed Object Storage System

Distributed object storage systems are a type of data storage system that stores and manages data as objects, rather than as files or blocks. In an object storage system, an object consists of a file and its associated metadata, which describes the properties and characteristics of the data.

Distributed Object Storage System

Some of the key benefits of distributed object storage system include:

  1. Scalability: Object storage systems can scale horizontally, allowing administrators to add more storage nodes as needed to accommodate growing amounts of data.
  2. High availability: Object storage systems are designed to be highly available, with multiple copies of data stored across multiple nodes to ensure that data is always accessible.
  3. Durability: Object storage systems typically use erasure coding or other techniques to ensure that data is protected against failures, even if multiple nodes or disks fail.
  4. Cost-effectiveness: Object storage systems can be more cost-effective than traditional file or block storage systems, especially when storing large amounts of unstructured data.

Here is a list of some famous distributed object storage systems:

  • Amazon S3
  • Google Cloud Storage
  • Azure Blob Storage
  • IBM Cloud Object Storage
  1. Dependency Required
Gradle:

implementation 'com.amazonaws:aws-java-sdk-s3:1.12.402'

Maven:

<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.12.402</version>
</dependency>

https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3

2. Create the Configuration file.

@Configuration
public class AwsS3Client {

@Bean
public AmazonS3 getS3Client() {
BasicAWSCredentials awsCreds = new BasicAWSCredentials("access_key"
, "secret_key");

return AmazonS3ClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(awsCreds))
.withRegion(Regions.fromName("your-bucket-region"))
.build();
}
}
  Cloud Regions:

GovCloud("us-gov-west-1", "AWS GovCloud (US)"),
US_GOV_EAST_1("us-gov-east-1", "AWS GovCloud (US-East)"),
US_EAST_1("us-east-1", "US East (N. Virginia)"),
US_EAST_2("us-east-2", "US East (Ohio)"),
US_WEST_1("us-west-1", "US West (N. California)"),
US_WEST_2("us-west-2", "US West (Oregon)"),
EU_WEST_1("eu-west-1", "EU (Ireland)"),
EU_WEST_2("eu-west-2", "EU (London)"),
EU_WEST_3("eu-west-3", "EU (Paris)"),
EU_CENTRAL_1("eu-central-1", "EU (Frankfurt)"),
EU_NORTH_1("eu-north-1", "EU (Stockholm)"),
EU_SOUTH_1("eu-south-1", "EU (Milan)"),
AP_EAST_1("ap-east-1", "Asia Pacific (Hong Kong)"),
AP_SOUTH_1("ap-south-1", "Asia Pacific (Mumbai)"),
AP_SOUTHEAST_1("ap-southeast-1", "Asia Pacific (Singapore)"),
AP_SOUTHEAST_2("ap-southeast-2", "Asia Pacific (Sydney)"),
AP_NORTHEAST_1("ap-northeast-1", "Asia Pacific (Tokyo)"),
AP_NORTHEAST_2("ap-northeast-2", "Asia Pacific (Seoul)"),
SA_EAST_1("sa-east-1", "South America (Sao Paulo)"),
CN_NORTH_1("cn-north-1", "China (Beijing)"),
CN_NORTHWEST_1("cn-northwest-1", "China (Ningxia)"),
CA_CENTRAL_1("ca-central-1", "Canada (Central)"),
ME_SOUTH_1("me-south-1", "Middle East (Bahrain)"),
AF_SOUTH_1("af-south-1", "Africa (Cape Town)");

3. Utility classes

Bucket Name: student-reports
String cloudPath = "/prod/student";
  1. Upload
/**
cloudFilePath is Dir where file to be uploaded.
**/
public void uploadToS3(String bucketName, String cloudPath, File file) {
PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, cloudPath, file);
amazonS3client.putObject(putObjectRequest);
}

uploadToS3("student-reports", "/prod/student/report_10.csv", new File("/usr/Download/report_10.csv"));

2. Contains Check

public boolean doesObjectExistInS3(String bucketName, String key) {
return amazonS3client.doesObjectExist(bucketName, key);
}

doesObjectExistInS3("student-reports", "/prod/student/report_10.csv");

3. Delete the Object

amazonS3client.deleteObject("student-reports", "/prod/student/report_10.csv");

4. Deleting Multiple Objects

public void deleteFilesFromS3(String bucketName, List<KeyVersion> keys) {
try {
// Delete the objects.
DeleteObjectsRequest multiObjectDeleteRequest = new DeleteObjectsRequest(bucketName)
.withKeys(keys)
.withQuiet(false);

// Verify that the objects were deleted successfully.
DeleteObjectsResult delObjRes = amazonS3client.deleteObjects(multiObjectDeleteRequest);

int successfulDeletes = delObjRes.getDeletedObjects().size();

} catch (AmazonServiceException e) {
// The call was transmitted successfully, but Amazon S3 couldn't process
// it, so it returned an error response.
e.printStackTrace();
} catch (SdkClientException e) {
// Amazon S3 couldn't be contacted for a response, or the client
// couldn't parse the response from Amazon S3.
e.printStackTrace();
}
}


deleteFilesFromS3("student-reports". List.of(new KeyVersion("/prod/student/report_10.csv")));

Or

String objkeyArr[] = {
"/prod/student/report_10.csv",
"/prod/student/report_11.csv",
"/prod/student/report_1.csv",
};

DeleteObjectsRequest multiObjectDeleteRequest = new DeleteObjectsRequest("student-reports")
.withKeys(objkeyArr);

amazonS3client.deleteObjects(delObjReq);

5. Rename the object

Direct renaming of s3 object is not possible

Step 1: Copy the Object with a new name.

Step 2: Delete an old object.

Eg: Renaming the object name from “/prod/student/report_10.csv” to “/prod/student/report_40.csv”

CopyObjectRequest copyObjRequest = new CopyObjectRequest(sourceBucketName, 
keyName, destinationBucketName, destinationKeyName);

amazonS3client.copyObject(copyObjRequest);

amazonS3client.deleteObject(new DeleteObjectRequest(bucketName, keyName));
CopyObjectRequest copyObjRequest = new CopyObjectRequest("student-reports", 
"/prod/student/report_10.csv", "student-reports", "/prod/student/report_40.csv");

amazonS3client.copyObject(copyObjRequest);

amazonS3client.deleteObject(new DeleteObjectRequest("student-reports", "/prod/student/report_10.csv"));

6. Copying, Moving the object

CopyObjectRequest copyObjRequest = new CopyObjectRequest(bucketName, 
keyName, bucketName, destinationKeyName);

amazonS3client.copyObject(copyObjRequest);

7. Download the object

S3Object s3object = amazonS3client.getObject("student-reports", "/prod/student/report_10.csv");
S3ObjectInputStream objectInputStream = amazonS3client.getObjectContent();
FileUtils.copyInputStreamToFile(objectInputStream, new File("/usr/Download/report_10.csv"));

Hope you like it !!!
Thanks for reading !!!

--

--