AWS RDS Best Practices
AWS recommends RDS best practices in terms of Monitoring, Performance and security
Amazon RDS Basic Operational Guidelines
- Memory, CPU, and storage usage should be monitored.
- CloudWatch can be setup for notifications when usage patterns change or when the capacity of deployment is approached, so that system performance and availability can be maintained
- Scale up the DB instance when approaching storage capacity limits.
- There should be some buffer in storage and memory to accommodate unforeseen increases in demand from the applications.
- Enable Automatic Backups and set the backup window to occur during the daily low in WriteIOPS.
- On a MySQL DB instance,
- Do not create more than 10,000 tables using Provisioned IOPS or 1000 tables using standard storage. Large numbers of tables will significantly increase database recovery time after a failover or database crash. If you need to create more tables than recommended, set the innodb_file_per_table parameter to 0.
- Avoid tables in the database growing too large. Provisioned storage limits restrict the maximum size of a MySQL table file to 6 TB. Instead, partition the large tables so that file sizes are well under the 6 TB limit. This can also improve performance and recovery time.
- If the database workload requires more I/O than provisioned, recovery after a failover or database failure will be slow.
- To increase the I/O capacity of a DB instance,
- Migrate to a DB instance class with High I/O capacity.
- Convert from standard storage to Provisioned IOPS storage, and use a DB instance class that is optimized for Provisioned IOPS.
- if using Provisioned IOPS storage, provision additional throughput capacity.
- Multi-AZ & Failover
- Deploy applications in all Availability Zones, so if an AZ goes down, applications in other AZs will still be available.
- Use Amazon RDS DB events to monitor failovers.
- Set a TTL of less than 30 seconds, if the client application is caching the DNS data of the DB instances. As the underlying IP address of a DB instance can change after a failover, caching the DNS data for an extended time can lead to connection failures if the application tries to connect to an IP address that no longer is in service.
- Multi-AZ requires transaction logging feature to be enabled. Do not use features like Simple recover mode, offline mode or Read-only mode which turn of transaction logging.
- To shorten failover time
- Ensure that sufficient Provisioned IOPS allocated for your workload. Inadequate I/O can lengthen failover times. Database recovery requires I/O.
- Use smaller transactions. Database recovery relies on transactions, so break up large transactions into multiple smaller transactions to shorten failover time
- Test failover for your DB instance to understand how long the process takes for your use case and to ensure that the application that accesses your DB instance can automatically connect to the new DB instance after failover.
DB Instance RAM Recommendations
- An Amazon RDS performance best practice is to allocate enough RAM so that the working set resides almost completely in memory.
- Value of ReadIOPS should be small and stable.
- ReadIOPS metric can be checked, using AWS CloudWatch while the DB instance is under load, to tell if the working set is almost all in memory
- If scaling up the DB instance class with more RAM, results in a dramatic drop in ReadIOPS, the working set was not almost completely in memory.
- Continue to scale up until ReadIOPS no longer drops dramatically after a scaling operation, or ReadIOPS is reduced to a very small amount.
Amazon RDS Security Best Practices
- Do not use AWS root credentials to manage Amazon RDS resources; and IAM users should be created for everyone,
- Grant each user the minimum set of permissions required to perform his or her duties.
- Use IAM groups to effectively manage permissions for multiple users.
- Rotate your IAM credentials regularly.
Using Enhanced Monitoring to Identify Operating System Issues
- Amazon RDS provides metrics in real time for the operating system (OS) that your DB instance runs on.
- Enhanced monitoring is available for all DB instance classes except for db.t1.micro and db.m1.small.
Using Metrics to Identify Performance Issues
- To identify performance issues caused by insufficient resources and other common bottlenecks, you can monitor the metrics available for your Amazon RDS DB instance
- Performance metrics should be monitored on a regular basis to benchmark the average, maximum, and minimum values for a variety of time ranges. to help identify performance degradation.
- Amazon CloudWatch alarms can be set for particular metric thresholds to be alerted when they are reached or breached
- A DB instance has a number of different categories of metrics which includes CPU, memory, disk space, IOPS, db connections and network traffic, and how to determine acceptable values depends on the metric.
- One of the best ways to improve DB instance performance is to tune the most commonly used and most resource-intensive queries to make them less expensive to run.
- InnoDB is the recommended and supported storage engine for MySQL DB instances on Amazon RDS.
- However, MyISAM performs better than InnoDB if you require intense, full-text search capability.
- Point-In-Time Restore and snapshot restore features of Amazon RDS for MySQL require a crash-recoverable storage engine and are supported for the InnoDB storage engine only.
- Although MySQL supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery and data durability.
- MyISAM storage engine does not support reliable crash recovery and might prevent a Point-In-Time Restore or snapshot restore from working as intended which might result in lost or corrupt data when MySQL is restarted after a crash.
- XtraDB is the recommended and supported storage engine for MariaDB DB instances on Amazon RDS.
- Point-In-Time Restore and snapshot restore features of Amazon RDS for MariaDB require a crash-recoverable storage engine and are supported for the XtraDB storage engine only.
- Although MariaDB supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery
and data durability.
- For e.g although Aria is a crash-safe replacement for MyISAM, it might still prevent a Point-In-Time Restore or snapshot restore from working as intended. This might result in lost or corrupt data when MariaDB is restarted after a crash.
Sample Exam Questions
- Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
- AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
- AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
- Open to further feedback, discussion and correction.
- You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
- Move to an SSD backed instance
- Move the database to an EBS-Optimized Instance
- Use Provisioned IOPs EBS
- Use the ephemeral storage on an m2.4xLarge Instance Instead
- Amazon RDS automated backups and DB Snapshots are currently supported for only the __________ storage engine
AWS recommends the following to get maximum benefit and satisfaction from EC2
Security & Network
- Implement the least permissive rules for your security group.
- Regularly patch, update, and secure the operating system and applications on your instance
- Launch your instances into a VPC instead of EC2-Classic (If aws account is newly created VPC is used by default)
- Manage access to AWS resources and APIs using identity federation, IAM users, and IAM roles
- Establish credential management policies and procedures for creating, distributing, rotating, and revoking AWS access credentials
- EC2 supports Instance store and EBS volumes, so its best to understand the implications of the root device type for data persistence, backup, and recovery
- Use separate Amazon EBS volumes for the operating system (root device) versus your data.
- Ensure that the data volume (with your data) persists after instance termination
- Use the instance store available for your instance to only store temporary data. (Remember that the data stored in instance store is deleted when you stop or terminate your instance)
- If you use instance store for database storage, ensure that you have a cluster with a replication factor that ensures fault tolerance.
- Use instance metadata and custom resource tags to track and identify your AWS resources
- View your current limits for Amazon EC2. Plan to request any limit increases in advance of the time that you’ll need them.
Backup & Recovery
Regularly back up your instance using Amazon EBS snapshots (not done automatically) or a backup tool.
- Data Lifecycle Manager (DLM) to automate the creation, retention, and deletion of snapshots taken to back up the EBS volumes
- Implement High Availability by deploying critical components of the application across multiple Availability Zones, and replicate the data appropriately
- Monitor and respond to events.
- Design your applications to handle dynamic IP addressing when your instance restarts.
- Implement failover. For a basic solution, you can manually attach a network interface or Elastic IP address to a replacement instance
- Regularly test the process of recovering your instances and Amazon EBS volumes if they fail.