Follow Us

Monitoring the pulse of IT

Next generation distributed monitoring, the Opsview way



One of Opsview’s great features is distributed monitoring, which we’ve had for over 5 years now. From the web user interface, you can assign hosts to a slave system and Opsview will take care of all the configuration work for you: from the slave configuration files, to the slave results sent to the master, to the master configuration with freshness checking.

We do all the system integration work, so you don’t have to.

However, there are some limitations in our chosen technologies. We use NSCA, which is the most common method in the Nagios world, and while we’ve made improvements to it that have gone back upstream, there are some baked-in limitations:

  • Only the first 511 bytes of plugin out was returned to the master, limiting the usefulness of the information you could display
  • Only the 1st line of data was returned, meaning you had to cramp output together
  • NSCA communication used fixed size packets which were inefficient
  • While results were sent, Nagios would wait for completion, introducing a bottleneck
  • If there was a communication problem with the master, results were dropped

Sometimes to move forward, you have to leave the past behind.

So we did that - we ripped out NSCA from Opsview’s slave communications and we’ve addressed every one of these limitations - and added a few nice extras too!

We’ve chosen NRD (Nagios Result Distributor) as our core technology. This is a library, written by one of our partners, CAPSiDE. There are many reasons we chose this, but the top four are:

  • It is based on perl, which is our language of choice
  • It has taken the test suite we developed for NSCA and enhanced it, demonstrating a mature approach to code development
  • The client and server code is a thin shim over the libraries, which means you can easily create your own clients
  • We have a good relationship with CAPSiDE and they have given us access to their code repository

We’ve spent some time understanding the core NRD code, enhancing it, fixing some issues and adding in some great new features. CAPSiDE have also released it on CPAN for wider consumption.

So Opsview’s new process for sending results from a slave is:

A couple of other amazing features we’ve squeezed in:

  • A known Nagios limitation is the named pipe to submit results. We’ve overcome this by writing directly to the checkresults spool directory - this reduces a Nagios processing cycle on the Opsview master
  • We’ve implemented transactions in the results, so if the client has a failure communicating to the server, the client will back off and retry again in 5 seconds. This guarantees you do not have duplicated results
  • The nrd daemon on the master will dynamically add more servers as workload increases, thanks to the features of Net::Server
  • As all communication between master and slaves is over a tunnelled SSH session, we’ve updated our Opsview check scripts to restart these tunnels if the slave is exhibiting communication errors

With all this extra capabilities, you would think there is a cost in performance. But in fact, our testing shows that performance has got better!

(Based on sending 2016 results in a single transaction over an SSH tunnel from a slave to a master. Times measured on the client.)

This shows that we are getting an average 62% improvement in all aspects of slave communication back to the master!

We are thrilled we’ve added this major new functionality into Opsview and have taken distributed monitoring another huge step further over any of our competitors.

But the best thing is: this is available immediately with our Opsview Community 3.11 release. Install the VM, add a slave and you will get this new architecture setup as part of the process. And if you are an existing Opsview user, you get a silky smooth switch-over. We’ve done a lot of testing to ensure that as part of the upgrade, Opsview will automatically switch any slaves to this architecture and start sending results in the new NRD way.

Tags: distributed monitoring, nagios, virtualisation

RSSSubscribe to this blog

More from Techworld

More relevant IT news

Contact Us

For editorial queries:
Mike Simons Mike_Simons@idg.co.uk

For website issues:
Email webmaster@techworld.com

For commercial queries
Russell Kearney russell_kearney@idg.co.uk


For more contact details click here.


Email this to a friend

* indicates mandatory field





Techworld White Papers

Optimising data protection for virtual environments

VM environments require the same level of data protection as does the physical server environment. Companies may use data protection tools built for the physical environment in the virtual world, but this has serious disadvantages.

Download Whitepaper

PCI Compliance: Are UK businesses ready?

Exploring the results of a recent survey, including: ? Levels of understanding of the standard ? Current perceptions of actual compliance status ? Attitudes toward addressing compliance

Download Whitepaper

Mobility Management for Dummies

Your complete guide to managing and securing mobile devices such as laptops and smartphones.

Download Whitepaper

Magic Quadrant for midrange and high-end NAS solutions

It is difficult to find one midrange or high-end NAS product that can cater to all needs. File systems embedded in NAS are often designed to solve one major pain point, with additional features being added later to broaden use cases and benefits.

Download Whitepaper

Techworld UK - Technology - Business

Oracle Video

Enabling agile and intelligent businesses

 Changing markets, competitive pressures and evolving customer needs are placing increasing pressure on IT to deliver greater flexibility and speed. Explore truly flexible SOA foundations with this Oracle video.

Watch
COLT White Paper

IT Misuse Survey

Complete this survey and you could win a Nexus One

Techworld are running a short survey to discover how UK businesses are managing Internet and email misuse in the Enterprise.

Complete Survey

Complete our survey and you could win a Sony E-book Reader.
Techworld have teamed up with HP to compile a survey relating to server virtualisation. Complete the short survey and you could be the lucky winner of a Sony E-book reader.

Complete the survey here

Site Map

Test