You are designing a file-sharing service. This service will have millions of files on it. Revenue for the service will come from fees based on how much storage the user is using. You also want to store metadata on each file, such as title, description and whether the object is public or private. How do you achieve all of these goals in a way that is economical and can scale to millions of users? (Choose 1)
a. Store all files in Amazon Simple Storage Service (S3). Create a bucket for each user. Store metadata in the filename of each object, and access it with LIST commands against the S3 API.
b. Store all files Amazon S3. Create Amazon DynamoDB tables for the corresponding key-value pairs on the associated metadata, when objects are uploaded.
c. Create a striped set of 4000 IOPs Elastic Block Store Volumes to store the data. Use a database running in Amazon Relational Database Service (RDS) to store the metadata.
d. Create a striped set of 4000 IOPs Elastic Block Store Volumes to store the data. Create Amazon DynamoDB tables for the corresponding key-value pairs on the associated metadata, when objects are uploaded.
This question is testing your understanding of S3 and EBS Volumes and the varying use cases for both. Before I go any further ahead I’d recommend reading and understanding the Service Limits:
Similarly I’d also recommend reading the Frequently Asked Questions:
and finally also watching the following re:Invent video from last year:
Watching the video will help you understand the answer to this question. In normal fashion lets rule our the obvious answers.
“Answer A” is incorrect. Whilst this option suggests using S3 which is scalable to both millions of files and millions of users as per the requirements, it recommends creating a bucket per user. This isn’t scalable to the requirements of millions of users and as per the Amazon S3 Service Limits – there is a limit of 100 buckets per account.
“Answer C” is incorrect. The solution is recommending to use Striped Elastic Block Store (EBS) Volumes of 4000 IOPs per volume. So if I were to assume that it was using GP2 SSD that equate to approximately 1.3TB per disk (as you get 3 IOPS per GB). The Instance Volume limits states the following:
Attaching more than 40 volumes can cause boot failures. Note that this number includes the root volume, plus any attached instance store volumes and EBS volumes. If you experience boot problems on an instance with a large number of volumes, stop the instance, detach any volumes that are not essential to the boot process, and then reattach the volumes after the instance is running.
Attaching more than 40 volumes to a Linux instance is supported on a best effort basis only and is not guaranteed.
So even if I were to assume that we had an instance with 40 Disks configured in a RAID5 this would equate to approximately 50TB of usable storage, which isn’t going to scale for millions of users and probably not even millions of files.
“Answer D” is incorrect for the same reasoning as “Answer C” around being scalable for millions of users. However that being said the answer is partially correct regarding using DynamoDB for storing the objects key-value pairs on the associated object metadata when files are uploaded.
Therefore as we’ve ruled out all the other options; “Answer B” is correct. Amazon S3 is the most viable storage platform for the file sharing platform given its designed and able to scale for these sort of requirements and whilst combining this with DynamoDB for the metadata is the overall solution that meets all the requirements.
That’s all for now.