About Alerts for Your Environment

Alert Types

  • Load Average
  • IO Wait
  • Swap Used
  • Free Space
  • Backup Alerts
  • Postgres Alerts

Load Average

Load average represents the average system load an instance experiences over a period of time.

Current Thresholds:

  • Warn: 4 x vCPU
  • Fail: 10 x vCPU

For example: a 1 vCPU, the load would be 4.00 but a 5 vCPU, it would be 20.00.

Note: A vCPU is the same as an ECU (an Amazon EC2 Compute Unit).

For general information about how load average is calculated, see Load (computing).

IO-Wait

The instances cpu is waiting for disk writes to complete before it can move on to other operations.

Current Thresholds:

  • Warn: 40% iowait
  • Fail: 80% iowait 

Swap Used

The amount of swap hard disk space used as virtual memory resources. High swap is an indication that an instance needs more memory.

Current Thresholds:

  • Warn: 128 MB Swap Used
  • Fail: 384 MB Swap Used 

Free Space

Free space is monitored on these mount points: //data/db, and /mnt.

You might not realize the instance is almost out of disk space until you get this alert. The thresholds are calculated based on the space allocated to the mount point.

Current Thresholds:

  • Warn: If the disk space for a particular mount point is 10 GB or less, then the warning threshold is 70% full. If the disk space is greater than 10 GB, then the warning threshold is 80% full.
  • Fail: 90% of disk space is full.

The best practice is to review the content of the volume and confirm the usage is appropriate to your use case. If you do need to increase the available space it might be possible for Engine Yard Support to resize your volume online if your instance is "current generation"; if not, you will need to replace the instance with one that includes a larger volume.

Backup Alerts

'Unable to backup [your database name]: already a backup in progress. 
Use --allow_concurrent to enable concurrent backup runs. Details at
/var/log/eybackup.log.'

 This alert indicates that your backup is running for more time than the current interval between your backups. This can often create situations where backups stack up behind each other driving up load on the target host. In the case of a replica, replication state is not evaluated during backups since it is common for replication to lag or stall during a backup.

As a result the backup tools now provide this warning and, by default, will not start a new backup if one is already running. Some possible ways to address this concern include scheduling a larger interval between backup runs using the dashboard or custom cookbooks, or upgrading to an instance that can process the backup job more quickly. This warning does not indicate anything is critically wrong with your database, and is safe to ignore; however, for visibility and awareness purposes it is not possible to disable this warning.