Netcon
home | screenshots | faq | discussion  

 
At a Glance
 
Netcon is an operational network and machine monitoring tool.

Netcon's primary goal is to NEVER CRY WOLF, through the use of incident error grouping, incident notification control, trend analysis, and hierarchial suppression.

Netcon provides control by allowing configuration of all monitoring and error trigger conditions through an HTML UI.

Netcon visualizes your data through tables, graphs and charts of system activity.

Netcon is extendable, allowing you to write your own data-collection monitors, analysis plug-ins, and more.

Netcon is distributed, allowing you to place lightweight monitoring agents on any machine (or all machines) in your network.
 

    

What is Netcon?

Netcon is an operational machine and service monitoring tool. It allows you to monitor machine paramaters such as CPU, and Disk Usage, services such as HTTP, and MySQL, as well as your own custom applications. When any of the reported data for these services meets a set of pre-determined triggers, the people responsible for those services can be notified.
faq | screenshots | discussion | download

What is Netcon about?

I wrote Netcon because I was tired of getting deluged with middle-of-the-night pages for non-errors.

Netcon's primary goal is to minimize the number of notifications sent during a failure and never cry wolf. Netcon does several things to achieve this goal, including:

  • automatically groups errors occuring together onto incidents
  • separates the creation of errors and incidents from the notification process.
  • errors can trigger on time-to-target trend analysis, in addition to simple value thresholds

The result is simple. Instead of receiving individual notifications about service problems, which can often include tens or hundreds of notifications on a medium to large system, with Netcon the user receives strictly time-periodic updates of the system state during an incident. For example, you can choose for Netcon to send one page every five minutes while an incident is active, each one telling you how many service failures are still pending on that incident, followed by a page indicating that the incident is resolved and cleared.

Because triggers can be based on time-to-target trend analysis, it is easy to create triggers which alert you when there will soon be a problem, instead of setting lots of value threshold triggers which alert you when there may or may not be a problem. For example, instead of receiving urgent notifications when a disk is down to 15% freespace, so you can check and make sure the consumption rate isn't too high, you can setup to be notified when the disk space is predicted to run out in less than 7 days. This also allows you to simplify trigger configuration, because you no longer have to tune the trigger levels based on the size of the disk installed.

 
Copyright © 2003 - David Jeske