En este post dejaré algunas notas que tomé para poder estudiar para el AWS SASS. Utilizo Evernote para guardar notas pero con el paso del tiempo he decidido retomar el blog ya que es una mejor manera de tener mis notas actualizadas. Actualizaré el post poco a poco. Las notas serán en inglés porque así es como hice el curso.
Las definiciones de los diferentes servicios las tomo de o bien la documentación de AWS o bien de los comentarios del instructor del curso que hice. Topics covered:
AWS S3
- A default of up to 100 buckets for new accounts
- Files from 0 bytes to 5 TB
- Unlimited storage
- Files stored in buckets(buckets are similar to a “folder”)
- Unique names
- When you upload filed to S3 you will receive an HTTP 200 code
- Supports Versioning
- Supports Encryption
- Lifecycle Management
- Secure your data – ACL
AWS S3 DATA CONSISTENCY
- Read after write consistency, for PUTS new objects. You are able to read a new file in S3 after the upload
- Eventual consistency for overwrites PUTS and DELETES, can take some time to propagate
AWS Naming
Amazon S3 virtual hosted style URLs follow the format shown below.
- https://bucket-name.s3.region.amazonaws.com
If my bucket name is “ruben” and the region is Ireland will be
- https://ruben.s3.eu-west-1.amazonaws.com
the path style, an old format which is DEPRECATED is like so
https://s3.eu-west-1.amazonaws.com/ruben/key
S3 buckets are globally named, but still stored within a region of your choosing which can affect things such as latency for accessing the bucket. S3 buckets can still be accessed from different regions in some cases. The main thing to note is that there are a few different ways to access an S3 bucket, some of which specify the region, and some do not .
S3 is object-based. Objects consist of the following:
- Key, name of the object
- Value, the data
- Version ID, important for the versioning of the object
- Metadata
- Subresources
- bucket policies
- access control list (ACL)
- Cross-Origin Resource
TIPS VERSIONING CROSS REPLICATION
- Versioning must be enabled on both the source and destination
- Regions must be unique
- Files in an existing bucket are not replicated automatically.
- You cannot replicate to multiple buckets
- Delete markers are replicated
- Deleting individual versions or delete markers will not be replicated
haring (CORS)
AWS S3 Storage class tier and availability
S3
- First 50 TB 0,023 $
- AWS S3 was built to deliver 99.99% availability
- Guarantee 99.9% availability
- Guarantee 99.999999999% durability of S3 object (11 x 9)
- Store redundantly across multiple devices in multiple facilities
- Designed to sustain the loss of two facilities concurrently
- More cost-effective than using S3 RRS
S3 Intelligent Tier
- Designed to cost optimize through automatic migration of data to the most economic S3 tier without affecting performance.
- S3 will store the same object in different classes, one for frequent access and another for infrequent access.
- You will pay a small fee for object monitorization and automation per object.
- S3 will move automatically objects without access in 30 days.
- Is the ideal class of storage for long-term data with an unpredictable pattern of access.
S3 – IA (Infrequent Access)
- First 50 TB 0,125 $
-
Lower fee than S3 but charged with retrieval fee
-
Same low latency and high throughput performance of Standard
-
Designed for durability of 99.999999999% of objects (11 x 9 )
-
Designed for 99.9% availability over a given year
S3 – IA (Infrequent Access) One Zone
- First 50 TBytes 0,01 $
-
Same low latency and high throughput performance of Standard
- Designed for durability of 99.999999999% of objects (11 x 9 )
-
Designed for 99.5% availability over a given year
S3 – RRS (Reduced Redundancy Storage)
Guarentee 99.99 % of durabilityGuarentee 99.99 % of availabilityUsed for data that can be recreated if lost*** AWS started to recommend to not use this class anymore ***
S3 – Archive Glacier
- Very cheap, use only for archival
- Takes 3-5 hours to restore from Glacier
- 0,01 $ per gigabyte
- Range Retrieval allows you to retrieve only specified byte ranges. You pay only for the actual data retrieved
- Retrieval data:
- Expedited:
- Expedited Retrieval can be used for occasional requests and typically, data is retrieved between 1-5 minutes (for files < 250 MB).
-
However, the expedited retrieval request is accepted by Glacier only if there is capacity available. If capacity is not available, Glacier will reject the request. To guarantee expedited retrieval availability, you can purchase provisioned capacity
- Standard:
- Standard would take 3-5 hours
- Bulk:
- Bulk retrieval is the lowest cost option to retrieve data from Glacier and can be used to cost-effectively retrieve large amounts of data
- Would take 5-12 hours
- Expedited:
- Data stored in Amazon Glacier is protected by default; only vault owners have access to the Amazon Glacier resources they create.
- Glacier automatically encrypts using AES 256. It handles the key management for you
S3 – Deep Archive Glacier
- Most economic class of the whole S3.
- Ideal for data that is being recovered one or two times per year or to replate old magnetic tapes
- Stored in at least 3 geographically different availability zones
- Offers 99,999999999 % durability
- You can retrieval data in standard mode (12 hours) or bulk (within 48 hours)
AWS S3 CHARGES
- storage per GB
- request(get,put,copy,etc)
- storage management pricing
- inventory, tags
- data management pricing
- data transferred out of S3 (data in is free)
- Transferring data from an EC2 instance to Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon SES, Amazon SQS, or Amazon SimpleDB in the same AWS Region has no cost at all.
- Transfer Acceleration
AWS S3 Multipart
AWS S3 encryption
Server-side encryption is about protecting data at rest.
SSE Types of Encryption
If you need server-side encryption for all of the objects that are stored in a bucket, use a bucket policy. For example, a following bucket policy could denies permissions to upload an object unless the request includes the x-amz-server-side-encryption
header to request server-side encryption.
-
Client side encryption: I encrypt in my laptop and then upload.
-
Server side encryption
- SSE-S3:
- AWS manages both data key and master key, cheaper than SS3-KMS.
- Every object is encrypted and there is additional safe guard: Amazon encrypts the key itself with the master key and regularly rotate the master key.
- Amazon handle all the keys for you.
- SSE-KMS: AWS manages data key and you manage the master key, more expensive than SS3-S3
-
Additional level of trail, whom, when, where uses the key
-
Additional level of transparency, who is decrypting what and when
-
Default key or you can generate new one
-
- SSE-C: You manage both data key and master key
- SSE-S3:
If you want to enforce the use of encryption in your bucket, use S3 Bucket Policy to deny PUT request that don’t include the x-amz-server-side-encryption in the request header
SSE-KMS
The first time you add an SSE-KMS–encrypted object to a bucket in a region, a default CMK is created for you automatically. This key is used for SSE-KMS encryption unless you select a CMK that you created separately using AWS Key Management Service. Creating your own CMK gives you more flexibility, including the ability to create, rotate, disable, and define access controls, and to audit the encryption keys used to protect your data.
Amazon S3 supports bucket policies that you can use if you require server-side encryption for all objects that are stored in your bucket. For example, you can set a bucket policy which denies permission to upload an object (s3:PutObject
) to everyone if the request does not include the x-amz-server-side-encryption
header requesting server-side encryption with SSE-KMS.
When you upload an object, you can specify the KMS key using the x-amz-server-side-encryption-aws-kms-key-id
header which you can use to require a specific KMS key for object encryption. If the header is not present in the request, Amazon S3 assumes the default KMS key. Regardless, the KMS key ID that Amazon S3 uses for object encryption must match the KMS key ID in the policy, otherwise Amazon S3 denies the request.
SSE-C
When using server-side encryption with customer-provided encryption keys (SSE-C), you must provide encryption key information using the following request headers:
x-amz-server-side-encryption-customer-algorithm
x-amz-server-side-encryption-customer-key
x-amz-server-side-encryption-customer-key-MD5
AWS S3 versioning
- S3 stores all versions of objects, even the deleted ones(including all writes and even if you delete an object).
- Great backup tool
- Versioning cannot be disabled, only suspended!
- integrated with lifecycle rules
- Versioning’s MFA delete capability, provide an extra layer of security
!!! Only the owner of an Amazon S3 bucket can permanently delete a version !!!
Tips Versioning Cross Replication
You use the Amazon S3 console to add replication rules to the source bucket. Replication rules define which source bucket objects to replicate and the destination bucket where the replicated objects are stored. You can create rules to replicate all the objects in a bucket or a subset of objects with specific key name prefixes (that is, objects that have names that begin with a common string). A destination bucket can be in the same AWS account as the source bucket, or it can be in a different account. The destination bucket must always be in a different Region than the source bucket.
- Versioning must be enabled on both the source and destination
- Regions must be unique
- Files in an existing bucket are not replicated automatically.
- You cannot replicate to multiple buckets
- Deleting individual versions or delete markers will not be replicated
Replication Updates
You can now opt in to Delete Marker Replication when you use S3’s Replication Time Control feature. These features were previously mutually exclusive, and you can now use them together to have confidence that deletions in a source bucket will be reflected in the target bucket, while also taking advantage of S3’s Replication SLA. To learn more, read Amazon S3 Replication Adds Support for Replicating Delete Markers.
LifeCycles
- Can be used in conjunction with versioning
- Can be applied to the current version and previous versions
- Transition to the IA (infrequent Access) 128kb and 30 days after the creations date
- Move to Glacier 30 days after IA, (mínimum 60 days)
- Permanently delete
S3 Object Lock
You can use S3 Object Lock to store objects using WORM (Write Once Read Many) model. It can help you prevent objects from being deleted or modified for a fixed amount time o indefinitely.
Governanve Mode
Users can’t overwrite or delete an object version or alter its lock settings until they have special permissions.
Compliance Mode
A protected object version can’t be overwritten or deleted by any user, not even the root user of the account. Retention mode can’t be changed or retention period can’t be altered.
Retetion period
- Protects an object version for a fixed period of time
- Amazon S3 stores a timestamp in the object’s version metadata to indicat when period expires
- After the retention period expires, the object version can be overwritten or deleted
Legal Holds
- Enables you to place a legal hold on a object version.
- Like retention period prevents an object to be overwritten or deleted.
- Does not have a retention period associated so it remains in effect until removed.
- S3:PutObjectLegalhold permission is required.
Glacier Vault Lock
- Allows you easily deploy and enforce compliance controls for individual S3 Glacier Vaults with a Vault Lock Policy.
- Once locked, the policy can no longer be changed.
Webhosting
- An S3 bucket that is configured to host a static website. The bucket must have the same name as your domain or subdomain. For example, if you want to use the subdomain acme.example.com, the name of the bucket must be acme.example.com.
- A registered domain name. You can use Route 53 as your domain registrar, or you can use a different registrar.
- Route 53 as the DNS service for the domain. If you register your domain name by using Route 53, we automatically configure Route 53 as the DNS service for the domain.
- If you need to access some assets that are in a different bucket remember to use the S3 Website URL rather than regular s3 bucket URL, example:
- https://mybucketname.s3-website-eu-west-1.amazonaws.com
Requester Pays
A bucket owner, however, can configure a bucket to be a Requester Pays bucket. With Requester Pays buckets, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket. The bucket owner always pays the cost of storing data
You must authenticate all requests involving Requester Pays buckets. The request authentication enables Amazon S3 to identify and charge the requester for their use of the Requester Pays bucket. After you configure a bucket to be a Requester Pays bucket, requesters must include x-amz-request-payer in their requests either in the header, for POST, GET and HEAD requests, or as a parameter in a REST request to show that they understand that they will be charged for the request and the data download.
Events
- Amazon Simple Notification Service (Amazon SNS) topic
- Amazon Simple Queue Service (Amazon SQS) queue
- AWS Lambda
If you experienced consistently > 100 PUT/DELETE/LIST request to your bucket or > 300 GET request per second, probably you’ll have to do some actions to improve the performance based on your workloads (get-intensive / not get-intensive / mix)
- Get Intensive Workloads: the best solution is to use CloudFront of course
- Mix Workloads:
- the key name of your objects can have an impact on the performance
- S3 will use the key name of the object to determine which partition will use to store the object
- the sequential key names, prefixed with the time stamp or ordered by alphabet, increases the probability to store a bunch of objects in the same partition, causing I/O issues
- Adding some randomness in the key name object avoid this problem, because S3 will store objects in different partitions
In 2018 AWS announced a massive improvement of the S3 performance so this guide is in practice, no longer needed.AWS S3 supports up to 3,5K PUT/COPY/POST/DELETE request per secondAWS S3 supports up to 5,5K GET /HEAD request per secondAWS S3 give first byte out of S3 within 100-200 miliseconds
S3 Select & Glacier Select
Enables applications to retrieve only a subset of data from an object using simple SQL expressions. Like S3 Select, Glacier Select allows you to run SQL queries against Glacier directly
Server Access Logs
While there is no additional cost for S3 server access logging, you are billed for the cost of log storage and the S3 requests for delivering the logs to your logging bucket. To stop S3 server access logging, you can go to the Properties tab of any bucket that you enabled logging on, and click the Edit button on the Server access logging panel. In the edit window, select Disabled and then click Save changes. You can also delete the S3 server access logs from your log delivery bucket so that you do not incur any additional storage charges.
Updates
- 2018
- S3 path deprecation
- two addressing models
- path style, bucket name: s3.amazonaws.com/bucket-name
- hosted style, my-bucket.s3.amazonaws.com
- S3 will no longer support path style from September 2020
- two addressing models
- S3 path deprecation
-
selective cross-region replication based on object type (granular level)
-
another layer of protection on s3: default bucket and objects cannot be public, you get an error message, to make it you have to edit public access settings manually.
- 2019
- Same Region Replication: now you can setup asyncrhonous replication of newly uploaded objects to a destination bucket in the same region. Called as SRR. You can replicate any storage class.
Links
- https://docs.aws.amazon.com/AmazonS3/latest/dev/security-best-practices.html
- https://digi.ninja/projects/bucket_finder.php
- https://read.acloud.guru/how-to-secure-an-s3-bucket-7e2dbd34e81b
- https://buckets.grayhatwarfare.com/
- https://stackoverflow.com/questions/53383674/how-do-i-determine-if-an-s3-bucket-has-public-access-using-aws-cli
- https://aws.amazon.com/es/premiumsupport/knowledge-center/s3-public-access-acl/
- https://stackoverflow.com/questions/52152035/list-s3-objects-with-public-read-permissions-in-private-bucket
- https://github.com/nagwww/s3-leaks
- https://medium.com/@grayhatwarfare/how-to-search-for-open-amazon-s3-buckets-and-their-contents-https-buckets-grayhatwarfare-com-577b7b437e01
- https://aws.amazon.com/es/blogs/security/iam-policies-and-bucket-policies-and-acls-oh-my-controlling-access-to-s3-resources/ -> best practices
- https://aws.amazon.com/es/blogs/security/writing-iam-policies-how-to-grant-access-to-an-amazon-s3-bucket/
- https://www.andreafortuna.org/2018/04/04/how-to-find-unsecured-s3-buckets-some-useful-tools/
IAM Policy for list and read a bucket
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::test"]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": ["arn:aws:s3:::test/*"]
}
]
}
Pingback: AWS certifications posts
Pingback: AWS DEVELOPER 2019