One of the things I currently do is monitor my websites to ensure they are up. Using a simple script and template, I pull down the page and regex it for a specific string. If I find a match, I assume the page is working. No match, site must be down or returning an error.
This is all well and good, but what I really want is something more complex. In particular I monitor these sites from multiple different datacenters spread across the country, but when the site goes down I receive multiple alerts (1 from each), and each one continuously emails me (1 from each). Sometimes I have just one “node” tell me its down, so this lets me know its not a web site issue, but an ISP issue at that site. I would rather not get this false positive either.
So the the new plugin I am working on is this. A multiple node configuration where each Cacti server talks to each other and shares the results of each website test. If 60% or higher are reporting a problem with the site, then it is most likely a valid error, so send an alert. This would also allow propagation of the websites to be monitored, so I only have to set it up on 1 site to have them all monitor it. It would use a floating master concept, since 1 server has to send the email (since I don’t want multiple). If the master can not be contacted by the other members, then an election is forced, and an alert is sent out by the new master. Each node would be classified by a “location” of either inside / outside the network, etc…. So you could set rules to alert if 100% of external sites can not reach it, but 100% of internal sites could, etc…. This would let us know of possible firewall or ISP issues even though the site is technically up. The alert issued would reflect the results, the status from each node, and possibly some other results.
We were originally using SiteSeer to do something like this, but the cost is outrageous for how simple a concept it is. Even with the cost of paying for 5 Cacti VMs in separate datacenters around the country, we will be spending 1/20 of the cost to monitor our sites, and can scale even higher without paying more. We also get the extra benefit to be able to run our own internally to diagnose the firewall issues which can’t be done by SiteSeer.