Database Snapshot vs Database Backup

Overview

This article describes key use cases and differences between the two default database backup solutions implemented by the Engine Yard platform. By default, each backup type executes daily and 10 copies of each backup are stored.

Database Snapshot

A Snapshot, or physical copy, is primarily used when:

  • restoring an environment from scratch
  • cloning or copying an environment into a new environment for testing or development purposes
  • adding an additional instance of a related role (ex. adding an app instance, a db_replica, or a utility instance).

This can also be used with a limited access feature that allows you to replace a database master if your existing instance is retired by AWS and there is not a replica available to take over that role. An instance startup typically takes between 10 and 20 minutes, and a volume created from the snapshot is available almost immediately, although depending on physical size, it can have performance latency while the blocks lazy-load from AWS infrastructure in the background (this can also impact application and utility instances). The only place where you would be actually selecting a snapshot to use would be when creating a utility instance, or when restarting an entire environment that was previously stopped.

Snapshots are not downloadable as physical files, so this limits the flexibility of what you can actually do with them during a restore operation. Although uncommon, physical copies can also be subject to filesystem corruption since they mirror whatever is on disk. Snapshots are processed and stored as diffs (i.e. based on the data differential since previous snapshot), so a regular and recurring snapshot schedule helps to ensure snapshots maintain a shorter average runtime.

Database Backup

A Database backup, or logical copy, is done by taking a dump of the database using standard database tools (mysqldump from MySQL, pg_dump from PostgreSQL). These are extremely flexible, and tools like 'eyrestore' can help you quickly and easily access any existing backup of one environment (like production) from a different environment (like staging).

Logical backups are full data copies that include the necessary statements to recreate your dataset on a new volume. Depending on your database size, logical copies can take a larger amount of time to restore.

Processing this type of backup does have a performance cost so for larger databases it is often valuable to schedule these to run against a replica. When using Postgres, the backup will automatically run against the first (oldest) replica in the environment. When using MySQL a custom chef cookbook can be used to specify which replica to run the backup against.

Logical Database backups can be very useful in test environments, staging, locally, or as a data source for a reporting and analytics service. Logical backups also allow for the restoration of a single or a subset of tables.

Testing

Engine Yard performs extensive testing of our backup tools before releasing an update, and the toolset is continually used by all Engine Yard customers for backup and restore operations. This should not be considered a replacement for understanding your own application's backup and recovery needs and regular testing of your backup and restore strategy.

Additional Resources


Content Author
: Tyler Poland