Table of contents of the article:
In the cloud computing universe, few technologies have had as significant an impact as the storage service S3 (Simple Storage Service) by Amazon Web Services (AWS). Introduced in 2006, S3 revolutionized the way businesses store and access data, offering a scalable, secure, and highly available solution. But how did S3 become a de facto standard in the world of cloud storage? In this article, we will explore the birth and evolution of S3, its distinctive features compared to other storage solutions, the advantages it offers, and how it has also been adopted by other suppliers, becoming a true industry standard.
The S3 Protocol and the Initial History
S3, an acronym for Simple Storage Service, is one of the most pioneering and reliable cloud storage services offered by Amazon Web Services (AWS) since its launch in 2006. Designed to provide developers with an efficient and scalable solution for storage and Retrieving data in any size and from anywhere across the web, S3 has introduced a paradigm shift in the world of data storage.
One of S3's key innovations is its object-oriented storage model, which stands out from the traditional organization of data in file systems based on a hierarchical structure of files and folders. Instead of adhering to this conventional pattern, S3 takes a “bucket” and “object” approach.
What Is A Bucket in S3?
A bucket in S3 can be compared to a high-level container within which users can store and organize a variety of data in the form of “objects”. Each bucket in S3 is globally unique, identified by a name that is unique across your AWS environment. This means that two different buckets, even if created by different AWS accounts, cannot have the same name. The bucket concept is critical to ensuring the efficient organization and management of data within S3.
Technical Characteristics of the Buckets
Unique Name
The requirement that each bucket have a globally unique name is critical to ensuring the uniqueness and accessibility of data on Amazon S3. This means that once you choose a name for a bucket, it will be reserved globally across the entire S3 platform, avoiding conflicts and confusion. The name of the bucket becomes part of the URL through which the contained data is accessed, following the format: http(s)://nome-bucket.s3.amazonaws.com/nome-oggetto
.
Access control
Amazon S3 provides sophisticated, granular access control mechanisms for the buckets and objects they contain. Bucket owners can use Identity and Access Management (IAM) policies to define who can access data and how. Additionally, S3 supports Access Control Lists (ACLs) to manage object-level permissions. This level of control allows you to manage complex scenarios, such as securely sharing data with external users or creating multi-user environments.
Lifecycle Rules
Lifecycle rules allow bucket owners to automate object lifecycle management, reducing costs and simplifying storage administration. For example, you can configure rules to automatically move objects to cheaper storage classes after a certain period of inactivity or to automatically delete them after they reach the end of their useful life.
Logging and Monitoring
S3 provides advanced logging and monitoring capabilities that let you record and analyze operations performed on buckets and objects. Request logging provides details about who has access to data and how it is used, facilitating compliance and security. Monitoring, integrated with Amazon CloudWatch, allows you to receive real-time alerts on specific events, such as unexpected increases in access requests or storage costs.
Objects Inside a Bucket
Unique Identification
Each object in S3 is identified by a unique key, which determines its path within the bucket. This key, combined with the bucket's unique name, provides a global identifier for the object. S3's flat structure allows you to simulate a directory structure using object keys, but it's important to remember that S3 doesn't use a true hierarchical structure.
Metadata
Objects can include metadata, which are key-value pairs that describe or control the object's behavior. Standard metadata includes information such as the MIME type, content encoding, and last modification date. Users can also add custom metadata to meet specific needs.
Security and Cryptography
S3 offers robust options for data security and encryption. Objects can be encrypted both server-side (SSE-S3, SSE-KMS, SSE-C) and client-side, ensuring that data is protected both in transit and at rest. Server-side encryption is handled automatically by S3, while client-side encryption requires data to be encrypted before uploading.
How It Differs From Other Types of Storage Space
The primary differences between S3 and traditional file systems lie in scalability, durability, and availability. While a conventional filesystem is limited by the capacity of the physical disk on which it resides, S3 offers virtually unlimited scalability, allowing users to scale up and down storage space as needed.
Furthermore, S3 guarantees data durability of 99.999999999% (11 9's) and availability of 99.99%, figures virtually unmatched by traditional storage systems. This is made possible through automatic data replication across multiple data centers.
Advantages and Main Features
Scalability
S3 offers unprecedented scalability, allowing companies to store and manage amounts of data ranging from a few bytes to several petabytes without worrying about physical storage management.
Durability and Availability
With 99.999999999% durability and 99.99% availability, S3 ensures your data is always accessible and protected from loss.
Safety
S3 offers robust security features, including access control and encryption of data in transit and at rest.
Flexibility
Users can choose between different storage classes (e.g., Standard, Infrequent Access, Glacier) to optimize costs based on data access needs.
Cons and disadvantages of S3 compared to traditional filesystems
Although S3 represents the ultimate as a distributed block storage system, we must take into account not only the undeniable advantages, but also the inevitable disadvantages which we should necessarily take into account, for example when we are tempted to use S3 as a repository for our MySQL database or PostgreSQL, an absolutely popular and equally wicked choice for the following reasons.
Latency
Latency measures the time it takes for a data packet to travel across the network. In the case of comparing traditional file-based storage with object-based storage, file-based storage comes out on top in this regard. As long as the system has the path to where the data is located, recovering it is fast and simple, especially with today's flash storage solutions. Object-based storage, on the other hand, was created with cost efficiency and scalability in mind, and these benefits have typically come at the expense of speed and performance.
Performances
Throughput, or the amount of data sent or received in a given amount of time, is the measure of a system's performance. Here too, traditional file-based storage takes over. While file-based storage allows you to locate data very quickly through the hierarchical system, however, the throughput becomes slower as you have to open more directories, folders, and files. Imagine a directory with millions of subdirectories, which have millions of folders, which in turn contain millions of files each. Object-based storage is better for large volumes of data. While it may take a little longer to access your data, you don't have to search for it manually—the system does it for you.
Cost
Cost is the selling point of object-based storage. It was originally developed as a system for storing large amounts of data that should not be accessed too frequently, such as archives, raw video footage or secondary datasets. Legacy object-based storage was sometimes called “cheap and deep” storage by those in the industry because its pay-as-you-go model was cost-efficient. While file-based storage is not considered extremely expensive, it can result in higher costs as capacity is added. File-based storage cannot scale up—it must expand by adding more file-based storage systems (such as a network-attached server, or NAS). And adding entire new systems can increase costs.
Access Protocol
The ways in which file- and object-based storage systems access data are very different. Traditional file-based storage typically uses network file system (NFS) or other common network protocols optimized for low latency. Traditional object-based storage uses HTTP to access data. This makes it easy to recover data through many different applications and even web browsers. However, because HTTP is text-based, it is processed more slowly than file-based storage protocols—again underscoring that object-based storage offers simple access but cannot guarantee high performance.
S3 As De Facto Standard
The adoption of Amazon Web Services' S3 API has brought the cloud storage industry to a point of convergence, where numerous cloud service providers have adopted or made their storage offerings compatible with S3, effectively making it a standard for object storage in the cloud. This compatibility has greatly facilitated migration, integration and interoperability between different cloud ecosystems, allowing developers and enterprises to take advantage of the flexibility and efficiency of object-based storage. Below, we explore some of the major services that have embraced this compatibility, expanding the list with notable additions like Wasabi, Scaleway, BackBlaze, and CloudFlare R2.
Google Cloud Storage
It offers an S3-compatible interface, designed to facilitate data migration and interoperability. This allows users to take advantage of Google Cloud's powerful features, such as data analytics and artificial intelligence, while maintaining familiar data management.
Microsoft Azure Blob Storage
It supports an S3 compatibility mode, allowing developers to use the same S3 APIs to interact with data. This facilitates the integration of Azure into existing S3-based architectures and leverages Azure's advanced security and analytics capabilities.
IBM Cloud Object Storage
It offers an object storage solution with API compatibility with S3, optimized for large-scale data storage and management. IBM Cloud Object Storage is particularly suitable for companies that need high durability and scalability for their data.
Alibaba Cloud OSS
It offers object storage services with S3 API compatibility, enabling easy integration and efficient data management on a global scale, benefiting from Alibaba's vast data center network.
Wasabi Hot Cloud Storage
Wasabi presents itself as a highly competitive solution in the cloud storage landscape, offering extremely competitive prices and high performance. Its full compatibility with the S3 API allows for simple migration for S3 users, with the added benefit of no fees per gress or API request, making it a cost-effective choice for a wide variety of use cases, from backup to disaster recovery up to long-term archiving.
Scaleway Object Storage
Scaleway offers an object storage service that combines ease of use and transparent pricing with S3 API compatibility. This makes it an attractive solution for startups and businesses looking for a reliable, GDPR-compliant European cloud platform.
BackBlaze B2 Cloud Storage
BackBlaze B2 provides high-performance object storage at a significantly lower cost than other vendors. Compatibility with the S3 API makes it an attractive option for businesses looking to reduce storage costs without compromising speed or reliability.
CloudFlare R2 Storage
CloudFlare R2 stands out for its native integration with CloudFlare's content delivery network (CDN), offering object storage with no egress fees, making it especially beneficial for globally distributed content. Compatibility with the S3 API allows developers to easily leverage this integration, improving performance and reducing content delivery costs.
S3 compatible solutions for Self Hosted projects with MinIO
MinIO is a self-hosted object storage solution that stands out for its high performance and its full compatibility with Amazon S3 APIs. This platform is designed to provide developers and businesses with a scalable, secure and easily manageable storage system, leveraging existing infrastructure both on-premise and in private clouds. Below, we delve into the licensing model, cost, compatible features and type of use of MinIO.
License Model
MinIO adopts an open source licensing model under the GNU Affero General Public License v3.0 (AGPLv3), which allows the software to be used, modified and distributed freely, provided that any modifications or derivative versions are also made available under the same license. For companies that require a commercial license, which excludes the requirement to release changes under AGPL, MinIO offers enterprise subscription options. This model allows organizations to benefit from dedicated technical support, advanced security and management features, and SLA (Service Level Agreement) guarantees.
Cost
MinIO is free to use in its open source version, making it an attractive choice for startups and projects with limited budgets. For organizations looking for additional features and professional support, MinIO offers several enterprise subscription options based on infrastructure size and specific needs. The cost of the enterprise subscription is customized based on customer requirements and may vary depending on the number of nodes, the storage capacity required and the level of support required.
Compatible Features
MinIO supports a wide range of S3 API compatible features, including:
- Bucket and object management: Creating, listing and deleting buckets; uploading, downloading and managing objects.
- Multitenancy: Support for multi-user environments with data isolation.
- Data encryption: Support for encryption of data at rest and in transit, using Server-Side Encryption (SSE) and TLS.
- Fine access control: Implementation of access policies and authentication tokens for secure management of data access.
- Data replication: Configuring data replication between MinIO clusters for redundancy and disaster recovery.
- Data lifecycle management: Automation of object retention and deletion policies to optimize storage costs and management.
Type of Employment
MinIO is particularly suited to usage scenarios that require high performance, scalability and complete data control, including:
- Large-scale data storage: Ideal for storing large datasets, such as telemetry data, system logs, backups, and archives.
- Cloud-native applications: Support for applications designed to run in cloud environments, leveraging containers and orchestration for easy scalability and management.
- Big Data and analytics: Provides a reliable platform for storing analytics data, compatible with computing tools such as Hadoop, Spark and Presto.
- Machine Learning and AI: Storage of large volumes of data used for training and inference of machine learning models.
In summary, MinIO offers businesses a versatile, high-performance object storage solution, with the freedom and flexibility of an open source model, but with the ability to opt for advanced support and functionality through its enterprise options.
Conclusion
Since it launched in 2006, Amazon Web Services' S3 has redefined the expectations and possibilities of cloud data storage. Its reliability, scalability and security have made it a de facto standard in the industry, a position further strengthened by its widespread adoption and compatibility with other cloud providers. With the advent of solutions like MinIO, even organizations that prefer to manage their own storage can benefit from the flexibility and efficiency of S3. In an increasingly cloud-focused world, S3 continues to be a cornerstone of the data management strategies of businesses of all sizes.