The promotion process performs a series of checks in your environment instances before replacing the existing database master. The flow diagram below shows the high level processes.
A database promotion includes a record of all the executed actions and commands in the More Options area of your environment page.
How Does Replica Promotion Work?
Replica promotion is a manual event that is triggered using a link displayed on the cloud dashboard next to any replica database. Replica promotion is not automated primarily due to the damage that can result from a false-positive on whatever check would initiate an auto-failover. For example, many hard failure patterns will manifest simply as an unresponsive database or server; however, a deploy of poorly indexed queries that overloads the database could result in the same pattern. An auto-failover would result in the same application conditions existing but now the redundancy of having a replica would be gone resulting in a more significant concern.
In addition to the above, database replication is by its nature asynchronous so a replica promotion could also result in the loss of some transactional data. When the master is unresponsive there is also no effective means for validating the health of its attached replicas.
For these reasons, the current safest practice is to insert a human decision into the process. With a proactive monitoring support plan you do have the option of defining a default action our VNOC should take in these types of scenarios.
Diagram Legend
- Light blue actions -- Commands issued against the replica database.
- Light orange actions -- Commands issued against the master database.
Database Health
We define instance health as:
- The instance is present and we can communicate with it.
- The instance can be written to (this I/O is possible). For the database master, we perform a write/delete test in the database process. For replicas, we test I/O in the volume.
Replica Current with Master
A replica is considered current when it is in-synch with its database master’s events. A replication lag of greater than 60 seconds means the replica is not current and the promotion will be aborted.
Database Locking
Locking means that the database instance will not accept writes. We lock the database master as a preventative measure to avoid data loss (if application is referencing it by hostname).
More information
This table provides other resources related to database promotion.
For more information about... | See... |
---|---|
How to promote your database replica | Promote your Database Replica |
Adding a database instance to an environment | Add a database replica (slave) to an existing environment |
If you have feedback or questions about this page, add a comment below. If you need help, submit a ticket with Engine Yard Support.
Comments
Article is closed for comments.