Praveen Puri: Monitoring Independent Key Indicators

Monday, December 16, 2024

Monitoring Independent Key Indicators

When my support team was monitoring complex financial applications that spanned multiple servers, processes, and software teams, we put in place dual monitoring. First, we had a page that was updated regularly with key database queries (such as the number of transactions currently in transitional states). If counts were building we could detect if certain subsystems were down.

At the same time, we had a page that was updated by software monitoring multiple application logs, looking for certain key phrases. Many times, if a multiple outage or problem was occurring, we would be notified by at least one of the monitoring methods, many time by both.

The generalized lesson (even beyond software) is that it's good to monitor multiple key indicators that are independently generated. It increases your chances of catching problems early.

In fact, we found that some messages, if they occurred in a log file were like Spiderman's "Spidey sense!" They indicated that a problem was about to incur in the next 5-10 minutes and allowed us to take preventive measures.