Engine Yard implements its own monitoring and alerting stack that is crucial for maintaining the health and performance of your applications. This system uses a combination of open-source tools, custom configurations, and proprietary systems to monitor all environments and alert the users as required.
In this article, we will review the monitoring and alerting stack and its main components.
- The monitoring and alerting stack
The monitoring and alerting stack
The monitoring and alerting stack in Engine Yard is built using collectd:
- collectd is an open-source system statistics collection daemon for Unix.
- It is always running and watching as configured in /etc/engineyard/collectd.conf.
- Logs its alerts to /var/log/collectd.log.
- Is configured by Chef, meaning manual changes will be overwritten by a Chef run.
- It reports to the backend.
In this context, AWSM will be in charge of the alert lifecycle management:
- Receives the collectd alerts and pushes them to Support dashboards and customer mailboxes.
- Keeps track of all alerts and their mapping to specific instances, environments, and accounts.
While collectd is installed through a Chef Recipe, the specific monitoring tasks are added to it through other Recipes; for example, a recipe adding a Puma server will ensure that monitoring for that server is added to the collectd configuration file.
Similarly, it is possible for customers to further extend collectd as desired through the use of custom Recipes. This can include things like:
- Monitoring different parameters.
- Changing the thresholds for the existing alarms.
- Sending emails and Slack messages.
VNOC: Virtual Network Operations Center
Sometimes referred to as StormWatch (its internal name), VNOC is the Virtual Network Operations Center. In the past, this system included a dashboard for alerts that was monitored by a Support agent to report any issues; however, it is now in charge of raising and updating Zendesk tickets through the following mechanism:
- Periodically, VNOC queries the AWSM alerts API.
- It retrieves open alerts.
- Filters alerts for errors, ignoring minor warnings.
- Filters out alerts for non-premium and non-platinum environments, leaving only these.
- It checks whether the alert ID has a mapping to an existing, open ticket:
- If it exists, it updates the ticket with the alert.
- Otherwise, it creates a new ticket.
- Acknowledges the alert via the AWSM API.
Note that standard environments are not monitored for these alerts, and that they keep being polled and filtered out in subsequent runs.
Understanding the monitoring and alerting stack in Engine Yard is crucial for managing your applications effectively. This system, built around collectd, provides robust monitoring and alerting capabilities that can be customized to meet your specific needs.