The future of Thold
I have eluded in several places about the future features that will be available in thold. But the information is spread all over the place, so I will take the time now to list out changes, and a brief description of the changes. Some of these are slated for the next major release, some slightly further down the road. Please take note that “sponsored” feature requests make it into production a lot quicker than everything else. Most of the features added into v0.4 were actually sponsored by a third party.
Escalation – This feature has been requested multiple times, and is fairly simple to implement. It moves thold to be able to alert several different people at different times to allow an issue to be escalated up the chain of command. For instance you can have your technicians alerted of a problem after 10 minutes. If the problem still exists after an hour, alert the senior technician. If the problem still exists after 2 hours, then alert the supervisor. And so on. This is slated for the next major release (0.5)
Alternate Alerts – This goes well with the previous request, allows alerts other than emails. This will can be anything from SMS Messages, Running Scripts, SNMP Writes, SNMP Traps, etc… this will be pluginable, so that other plugins can add their own type of alerts. When combined with the previous request, you can have it set to restart the service after 5 minutes, send an email after 10 minutes, SMS Text after 30 minutes, etc… This is slated for the next major release (0.5). SNMP Write, SMS, etc.. will be handled by separate plugins to be released later.
Maintenance Periods – One of the future goals is to trim down the total number of alerts you receive from thold. Overloading of alerts tends to annoy people and leads to ignored alerts. A lot of times these alerts are valid alerts, but it is a known issue that is currently being worked by another member of your team, or is an automated process at a specific time. To help combat these types of false positives, I will be implementing Maintenance periods. You will be able to select specific hosts and / or time periods in which alerts will not be sent. If you have a process to restart your web service every Friday at 3am, then you can set a reoccuring schedule to eliminate alerts from the “Web Service Status” threshold template during that time. If you are planning on working on a server on Tuesday at 7pm and should be done in 2 hours, you can set a 1 time occurring which will expire and begin alerting again after the time period is over. This is slated for the next major release (0.5)
Minor Additions – There will lots of minor additions in the future too. This include things like customizable email messages per threshold (even per escalation). More logging and flexibility in the auto-creation (regex matching). Possibly a renaming of the plugin to “Alerts”. Technician on-call rotations. Really too many things to list.
If you have any suggestions, please feel free to list them here. There is no ETA on the next major release, but I can say that the escalation code, and alternate alerts are 60% completed already.