Monitoring

From NixNet
This article is part of a series of guides that describe NixNet's setup in excruciating detail. If you would like to follow them, please start at the Infrastructure page.
Caution: this guide is not finished yet; following it may leave you with an unuseable machine. To be notified of updates, please create an account and add it to your watchlist.

With physical servers and VMs scattered all over the world and across different providers, we need some way to monitor them and make sure they're behaving as they should. We also want to recieve notifications when they aren't and see a historical overview of what was going on around that time so we can troubleshoot as necessary. To that end, a combination of Prometheus and Grafana will fit perfectly. Prometheus will act as the backend, collecting and aggregating data from literally everything (applications, VMs, and physical hosts), while Grafana will take that data and present it in a useful manner through a highly configurable and extensible dashboard. Grafana also has a built-in granular alert system to fill that need as well.