AWS Solutions Architect Associate (SAA) 2018 - I

En este post dejaré algunas notas que tomé para poder estudiar para el AWS SASS. Utilizo Evernote para guardar notas pero con el paso del tiempo he decidido retomar el blog ya que es una mejor manera de tener mis notas actualizadas. Actualizaré el post poco a poco. Las notas serán en inglés porque así es como hice el curso.

Las definiciones de los diferentes servicios las tomo de o bien la documentación de AWS o bien de los comentarios del instructor del curso que hice. Topics covered:

AWS S3

A default of up to 100 buckets for new accounts
Files from 0 bytes to 5 TB
Unlimited storage
Files stored in buckets(buckets are similar to a “folder”)
Unique names
When you upload filed to S3 you will receive an HTTP 200 code
Supports Versioning
Supports Encryption
Lifecycle Management
Secure your data – ACL

AWS S3 DATA CONSISTENCY

Read after write consistency, for PUTS new objects. You are able to read a new file in S3 after the upload
Eventual consistency for overwrites PUTS and DELETES, can take some time to propagate

AWS Naming

Amazon S3 virtual hosted style URLs follow the format shown below.

https://bucket-name.s3.region.amazonaws.com

If my bucket name is “ruben” and the region is Ireland will be

https://ruben.s3.eu-west-1.amazonaws.com

the path style, an old format which is DEPRECATED is like so

https://s3.eu-west-1.amazonaws.com/ruben/key

S3 buckets are globally named, but still stored within a region of your choosing which can affect things such as latency for accessing the bucket. S3 buckets can still be accessed from different regions in some cases. The main thing to note is that there are a few different ways to access an S3 bucket, some of which specify the region, and some do not .

S3 is object-based. Objects consist of the following:

Key, name of the object
Value, the data
Version ID, important for the versioning of the object
Metadata
Subresources
- bucket policies
- access control list (ACL)
- Cross-Origin Resource

TIPS VERSIONING CROSS REPLICATION

Versioning must be enabled on both the source and destination
Regions must be unique
Files in an existing bucket are not replicated automatically.
You cannot replicate to multiple buckets
Delete markers are replicated
Deleting individual versions or delete markers will not be replicated
haring (CORS)

AWS S3 Storage class tier and availability

S3

First 50 TB 0,023 $
AWS S3 was built to deliver 99.99% availability
Guarantee 99.9% availability
Guarantee 99.999999999% durability of S3 object (11 x 9)
Store redundantly across multiple devices in multiple facilities
Designed to sustain the loss of two facilities concurrently
More cost-effective than using S3 RRS

S3 Intelligent Tier

Designed to cost optimize through automatic migration of data to the most economic S3 tier without affecting performance.
S3 will store the same object in different classes, one for frequent access and another for infrequent access.
You will pay a small fee for object monitorization and automation per object.
S3 will move automatically objects without access in 30 days.
Is the ideal class of storage for long-term data with an unpredictable pattern of access.

S3 – IA (Infrequent Access)

First 50 TB 0,125 $
Lower fee than S3 but charged with retrieval fee
Same low latency and high throughput performance of Standard
Designed for durability of 99.999999999% of objects (11 x 9 )
Designed for 99.9% availability over a given year

S3 – IA (Infrequent Access) One Zone

First 50 TBytes 0,01 $
Same low latency and high throughput performance of Standard
Designed for durability of 99.999999999% of objects (11 x 9 )
Designed for 99.5% availability over a given year

S3 – RRS (Reduced Redundancy Storage)

~~Guarentee 99.99 % of durability~~
~~Guarentee 99.99 % of availability~~
~~Used for data that can be recreated if lost~~
~~*** AWS started to recommend to not use this class anymore ***~~

S3 – Archive Glacier

Very cheap, use only for archival
Takes 3-5 hours to restore from Glacier
0,01 $ per gigabyte
Range Retrieval allows you to retrieve only specified byte ranges. You pay only for the actual data retrieved
Retrieval data:
- Expedited:
  - Expedited Retrieval can be used for occasional requests and typically, data is retrieved between 1-5 minutes (for files < 250 MB).
  - However, the expedited retrieval request is accepted by Glacier only if there is capacity available. If capacity is not available, Glacier will reject the request. To guarantee expedited retrieval availability, you can purchase provisioned capacity
- Standard:
  - Standard would take 3-5 hours
- Bulk:
  - Bulk retrieval is the lowest cost option to retrieve data from Glacier and can be used to cost-effectively retrieve large amounts of data
  - Would take 5-12 hours
Data stored in Amazon Glacier is protected by default; only vault owners have access to the Amazon Glacier resources they create.
Glacier automatically encrypts using AES 256. It handles the key management for you

S3 – Deep Archive Glacier

Most economic class of the whole S3.
Ideal for data that is being recovered one or two times per year or to replate old magnetic tapes
Stored in at least 3 geographically different availability zones
Offers 99,999999999 % durability
You can retrieval data in standard mode (12 hours) or bulk (within 48 hours)

AWS S3 CHARGES

storage per GB
request(get,put,copy,etc)
storage management pricing
- inventory, tags
data management pricing
- data transferred out of S3 (data in is free)
- Transferring data from an EC2 instance to Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon SES, Amazon SQS, or Amazon SimpleDB in the same AWS Region has no cost at all.
Transfer Acceleration

AWS S3 Multipart

In S3 Multipart Upload, you can upload a maximum object size of 5 TB and a part size of 5 MB to 5 GB (last part can be < 5 MB).

Multipart upload allows you to upload a single object as a set of parts. Each part is a contiguous portion of the object’s data. You can upload these object parts independently and in any order. If transmission of any part fails, you can retransmit that part without affecting other parts. After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object. In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation.

AWS S3 encryption

Server-side encryption is about protecting data at rest.

SSE Types of Encryption

If you need server-side encryption for all of the objects that are stored in a bucket, use a bucket policy. For example, a following bucket policy could denies permissions to upload an object unless the request includes the x-amz-server-side-encryption header to request server-side encryption.

There are these options available:

Client side encryption: I encrypt in my laptop and then upload.
Server side encryption
- SSE-S3:
  - AWS manages both data key and master key, cheaper than SS3-KMS.
  - Every object is encrypted and there is additional safe guard: Amazon encrypts the key itself with the master key and regularly rotate the master key.
  - Amazon handle all the keys for you.
- SSE-KMS: AWS manages data key and you manage the master key, more expensive than SS3-S3
  - Additional level of trail, whom, when, where uses the key
  - Additional level of transparency, who is decrypting what and when
  - Default key or you can generate new one
- SSE-C: You manage both data key and master key

If you want to enforce the use of encryption in your bucket, use S3 Bucket Policy to deny PUT request that don’t include the x-amz-server-side-encryption in the request header

SSE-KMS

The first time you add an SSE-KMS–encrypted object to a bucket in a region, a default CMK is created for you automatically. This key is used for SSE-KMS encryption unless you select a CMK that you created separately using AWS Key Management Service. Creating your own CMK gives you more flexibility, including the ability to create, rotate, disable, and define access controls, and to audit the encryption keys used to protect your data.

Amazon S3 supports bucket policies that you can use if you require server-side encryption for all objects that are stored in your bucket. For example, you can set a bucket policy which denies permission to upload an object (s3:PutObject) to everyone if the request does not include the x-amz-server-side-encryption header requesting server-side encryption with SSE-KMS.

When you upload an object, you can specify the KMS key using the x-amz-server-side-encryption-aws-kms-key-id header which you can use to require a specific KMS key for object encryption. If the header is not present in the request, Amazon S3 assumes the default KMS key. Regardless, the KMS key ID that Amazon S3 uses for object encryption must match the KMS key ID in the policy, otherwise Amazon S3 denies the request.

SSE-C

When using server-side encryption with customer-provided encryption keys (SSE-C), you must provide encryption key information using the following request headers:

x-amz-server-side-encryption-customer-algorithm
x-amz-server-side-encryption-customer-key
x-amz-server-side-encryption-customer-key-MD5

AWS S3 versioning

S3 stores all versions of objects, even the deleted ones(including all writes and even if you delete an object).
Great backup tool
Versioning cannot be disabled, only suspended!
integrated with lifecycle rules
Versioning’s MFA delete capability, provide an extra layer of security

!!! Only the owner of an Amazon S3 bucket can permanently delete a version !!!

Tips Versioning Cross Replication

You use the Amazon S3 console to add replication rules to the source bucket. Replication rules define which source bucket objects to replicate and the destination bucket where the replicated objects are stored. You can create rules to replicate all the objects in a bucket or a subset of objects with specific key name prefixes (that is, objects that have names that begin with a common string). A destination bucket can be in the same AWS account as the source bucket, or it can be in a different account. The destination bucket must always be in a different Region than the source bucket.

Versioning must be enabled on both the source and destination
Regions must be unique
Files in an existing bucket are not replicated automatically.
You cannot replicate to multiple buckets
Deleting individual versions or delete markers will not be replicated

Replication Updates

You can now opt in to Delete Marker Replication when you use S3’s Replication Time Control feature. These features were previously mutually exclusive, and you can now use them together to have confidence that deletions in a source bucket will be reflected in the target bucket, while also taking advantage of S3’s Replication SLA. To learn more, read Amazon S3 Replication Adds Support for Replicating Delete Markers.

LifeCycles

The expire action retains the current version as a previous version and places a delete marker as the current version.

If you want to permanently delete the previous versions, combine the expire action with the permanently delete previous version actions.

Can be used in conjunction with versioning
Can be applied to the current version and previous versions
Transition to the IA (infrequent Access) 128kb and 30 days after the creations date
Move to Glacier 30 days after IA, (mínimum 60 days)
Permanently delete

S3 Object Lock

You can use S3 Object Lock to store objects using WORM (Write Once Read Many) model. It can help you prevent objects from being deleted or modified for a fixed amount time o indefinitely.

Governanve Mode

Users can’t overwrite or delete an object version or alter its lock settings until they have special permissions.

Compliance Mode

A protected object version can’t be overwritten or deleted by any user, not even the root user of the account. Retention mode can’t be changed or retention period can’t be altered.

Retetion period

Protects an object version for a fixed period of time
Amazon S3 stores a timestamp in the object’s version metadata to indicat when period expires
After the retention period expires, the object version can be overwritten or deleted

Legal Holds

Enables you to place a legal hold on a object version.
Like retention period prevents an object to be overwritten or deleted.
Does not have a retention period associated so it remains in effect until removed.
S3:PutObjectLegalhold permission is required.

Glacier Vault Lock

Allows you easily deploy and enforce compliance controls for individual S3 Glacier Vaults with a Vault Lock Policy.
Once locked, the policy can no longer be changed.

Webhosting

Here are the prerequisites for routing traffic to a website that is hosted in an Amazon S3 Bucket:

An S3 bucket that is configured to host a static website. The bucket must have the same name as your domain or subdomain. For example, if you want to use the subdomain acme.example.com, the name of the bucket must be acme.example.com.
A registered domain name. You can use Route 53 as your domain registrar, or you can use a different registrar.
Route 53 as the DNS service for the domain. If you register your domain name by using Route 53, we automatically configure Route 53 as the DNS service for the domain.
If you need to access some assets that are in a different bucket remember to use the S3 Website URL rather than regular s3 bucket URL, example:
- https://mybucketname.s3-website-eu-west-1.amazonaws.com

Requester Pays

A bucket owner, however, can configure a bucket to be a Requester Pays bucket. With Requester Pays buckets, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket. The bucket owner always pays the cost of storing data

You must authenticate all requests involving Requester Pays buckets. The request authentication enables Amazon S3 to identify and charge the requester for their use of the Requester Pays bucket. After you configure a bucket to be a Requester Pays bucket, requesters must include x-amz-request-payer in their requests either in the header, for POST, GET and HEAD requests, or as a parameter in a REST request to show that they understand that they will be charged for the request and the data download.

Events

The Amazon S3 notification feature enables you to receive notifications when certain events happen in your bucket. To enable notifications, you must first add a notification configuration identifying the events you want Amazon S3 to publish, and the destinations where you want Amazon S3 to send the event notifications.

Amazon S3 supports the following destinations where it can publish events:

Amazon Simple Notification Service (Amazon SNS) topic
Amazon Simple Queue Service (Amazon SQS) queue
AWS Lambda

https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

Performance (OUTDATED)***

If you experienced consistently > 100 PUT/DELETE/LIST request to your bucket or > 300 GET request per second, probably you’ll have to do some actions to improve the performance based on your workloads (get-intensive / not get-intensive / mix)

Get Intensive Workloads: the best solution is to use CloudFront of course
Mix Workloads:
- the key name of your objects can have an impact on the performance
- S3 will use the key name of the object to determine which partition will use to store the object
- the sequential key names, prefixed with the time stamp or ordered by alphabet, increases the probability to store a bunch of objects in the same partition, causing I/O issues
- Adding some randomness in the key name object avoid this problem, because S3 will store objects in different partitions

In 2018 AWS announced a massive improvement of the S3 performance so this guide is in practice, no longer needed.

AWS S3 supports up to 3,5K PUT/COPY/POST/DELETE request per second

AWS S3 supports up to 5,5K GET /HEAD request per second

AWS S3 give first byte out of S3 within 100-200 miliseconds

S3 Select & Glacier Select

Enables applications to retrieve only a subset of data from an object using simple SQL expressions. Like S3 Select, Glacier Select allows you to run SQL queries against Glacier directly

Server Access Logs

While there is no additional cost for S3 server access logging, you are billed for the cost of log storage and the S3 requests for delivering the logs to your logging bucket. To stop S3 server access logging, you can go to the Properties tab of any bucket that you enabled logging on, and click the Edit button on the Server access logging panel. In the edit window, select Disabled and then click Save changes. You can also delete the S3 server access logs from your log delivery bucket so that you do not incur any additional storage charges.

Updates

2018
- S3 path deprecation
  - two addressing models
    - path style, bucket name: s3.amazonaws.com/bucket-name
    - hosted style, my-bucket.s3.amazonaws.com
    - S3 will no longer support path style from September 2020
selective cross-region replication based on object type (granular level)
another layer of protection on s3: default bucket and objects cannot be public, you get an error message, to make it you have to edit public access settings manually.
2019
- Same Region Replication: now you can setup asyncrhonous replication of newly uploaded objects to a destination bucket in the same region. Called as SRR. You can replicate any storage class.

IAM Policy for list and read a bucket

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::test"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject"
      ],
      "Resource": ["arn:aws:s3:::test/*"]
    }
  ]
}

AWS S3

AWS S3 DATA CONSISTENCY

AWS Naming

TIPS VERSIONING CROSS REPLICATION

AWS S3 Storage class tier and availability

S3

S3 Intelligent Tier

S3 – IA (Infrequent Access)

S3 – IA (Infrequent Access) One Zone

S3 – RRS (Reduced Redundancy Storage)

S3 – Archive Glacier

S3 – Deep Archive Glacier

AWS S3 CHARGES

AWS S3 Multipart

AWS S3 encryption

SSE Types of Encryption

SSE-KMS

SSE-C

AWS S3 versioning

Tips Versioning Cross Replication

Replication Updates

LifeCycles

S3 Object Lock

Governanve Mode

Compliance Mode

Retetion period

Legal Holds

Glacier Vault Lock

Webhosting

Requester Pays

Events

https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

Performance (OUTDATED)***

S3 Select & Glacier Select

Server Access Logs

Updates

Links

IAM Policy for list and read a bucket

2 thoughts on “AWS Solutions Architect Associate (SAA) 2018 – I”

Leave a Reply

Google Analytics