So the new website monitoring plugin has a name, Webseer. Granted this certainly isn’t set in stone, and may change on a whim once it is done (Google shows me that Webseer is currently the same name as a image search engine). I just thought it was catchy and had a nice ring to it.
The plugin is definitely taking shape quickly. It has a configuration interface, it can check your sites for you, and alert when they come up and down. I’m already using it in our production environment and am pleased with the results. Below is a small screen shot of the interface. It currently shows a bit of information, but there is much more that it records.
Thanks to the use of Curl, we have a plethora of stats ranging from the amount of time for the DNS query, the amount of time to create a connection, the amount of time to transfer, how many redirects did we follow, and the connection speeds. And that is just the tip of the iceberg. Once a website goes down, depending on the error, this information is useful in discovering the true issue at hand. We also can now determine ~30 different types of connection related issues. These range from not being able to create a socket or unable to DNS, to errors in the SSL Handshake mechanism. If its not an connection issue, well it also returns back real error messages for every HTTP code currently out there. All of this information is returned in the alert, so its right there at your finger tips. The alert is text based, so it looks decent on my Blackberry also. Since all this data is also stored in the MySQL database. I am sure that plenty of pretty graphs will pop up to show connection times and speeds as well.
As for what is left to be done. The absolute biggest thing that needs to be finished is the multi-node support. Over the weekend, the ISP that was hosting the VM was having issues, which caused a good bit of alerts to be sent out. To combat this, I will be running at least 3 VMs that will be communicating with each other and only alert if more than 50% of them say its down. This work will take a good amount of time to complete and perfect, but it will be a much better product for it. I also want to move each site check to a separate process, which way no one site is going to slow down the overall polling of the sites. The current default is that if the site appears to be down, then we triple check it just to be sure. It then has to be down for X number of pollings before we alert. The default timeout is 4 seconds. 3 checks times 4 second times out means a site can take upwards of 12 seconds to fail. If you have multiple sites failing, this could cause you to go over the 1 minute check interval. So this will be my first priority to fix, after the multi-node support.
There are also some other minor things that will need to be done. The first of which is options. I love options and think everything should be configurable. I have several in there now, but there are things like the checking itself that could be configurable too. Currently everything is polled every time the poller is ran (every minute in my case). But maybe I have some websites that I only wanted to be polled every 5 minutes? This is easily doable, it just takes time.
Once its all said and done, I will see about getting a few beta testers in on the action, and see how it goes.