Nagios - A Free, Open Source IP Network Monitoring Tool

 

Please note this guide is going to assume you have a basic understanding of virtual machines and Linux. If you’re unfamiliar, I will be linking to guides for everything, it may be worth reading through the basic setup guides for installing Linux inside of VirtualBox before you start.

Also please note that most of the ideas in this guide comes from Jeremy Thelen, I just had the time to type it up and make the VM

What is Nagios?

Nagios is an open source network monitoring tool, that lives in a Linux server. It can be either a physical server if you have one, or inside of a VM, and is accessed with a web gui. It can keep track of the status of most devices on your network, organize them by type, and keep logs so that you can know when something went down, or came back. In the case of Clear-Com Gear, for the most part I’m just going to demonstrate pings as the one module to use, but there are dozens of supported actions, including port availability for HTTP, SSH, SNMP monitoring, and more. For switches for example, it can give you all kinds of stats for usage, link status, etc.


nagios1.JPG

Using this app, you can get a snapshot of the network status your system from any PC on the network, because it’s a web gui. As you can see here, I have my Nagios instance pinging my Eclipse Frame’s CPU NIC, each network card in my frame, both of my LQs, and my Netgear switch. Devices can be organized into groups for easy sorting, and it’s giving me close to real time updates about devices. When you are done, your dashboard should include your whole system, whether it be an Eclipse with IPAs, LQs, IPTs, etc, if it has a NIC, we can add it.

 

This can help you troubleshoot a particular device dropping off the network, find out how long a device has been up, or have confidence in a remote device staying connected over a long period of time. Network Monitoring software is a common tool used by IT professionals to see their system at a glance. There are many, arguably easier, to use options that aren’t free and open source. Nagios itself has a paid version called Nagios XI, but for our purposes, Nagios Core is simple, free, and gives us an easy way to track the network status of the entire system over time. But fair warning, if you’re not used to dealing with Linux and looking at JSON like configuration files, this could be a challenge.

In this demo, I’m going to show you how to build this config, with just some simple hosts organized by category and with ping, HTML, and SNMP as the only modules I’m using.

Setup

 

If you’d like to skip setting it up from scratch, you can download my Ubuntu image here, that has Nagios already installed and set to a DHCP address. You might be lacking some context on how to configure the application for your specific system though, especially if you have not used Linux or Virtualization software before. [The password to the one user login is admin]

https://clearcom.ftpstream.com/download/OFCqla5fPzeF7CBnGbXQ/misc files/Greg's VM's/nagios.ova

 

Using a VM is probably the best way to set this up on an existing system. I use VirtualBox and prefer Ubuntu for the Linux distro, since it tends to be the easiest to use, but most of the common distros will work.

https://ubuntu.com/tutorials/how-to-run-ubuntu-desktop-on-a-virtual-machine-using-virtualbox#1-overview

A guide to installing an Ubuntu VM inside of VirtualBox can be found here. Both are free and open source.
Personally, I find giving your VM a specific NIC to use, as opposed to using a bridged mode adapter, tends to be easier to run these web gui based services. That way you can give your VM a static IP so it’s not moving around on DHCP every time you spin it down and back up. You do that inside of VirtualBox after installing the Ubuntu Image, under settings for the VM. Most of the time, I will use a USB nic, give it a static IP inside of windows, and then attach it to the VM. That way, I know where all my VM traffic is coming from, and what its IP is going to be.

Once you’ve completed the Ubuntu install inside of VirtualBox, follow the Nagios Quickstart guide for Ubuntu

https://support.nagios.com/kb/article/nagios-core-installing-nagios-core-from-source-96.html#Ubuntu

Just open up terminal and copy and paste each line in sequence from this quickstart guide. Just make sure to take the correct line for the version of Ubuntu you’re using, probably 20.xx if you followed the guide above. Make sure your VM is connected to the internet before you start. Once you’ve run everything and have started the nagios service inside of linux, you should be able to see the Nagios web gui at the IP address of your VM with /nagios appended afterwards, ie 192.168.1.1/nagios, and an “It works!” Apache test page at 192.168.1.1 . If you don’t know the IP of your VM, the ‘ip add’ command in terminal can tell you.

Configuration

Nagios Core Manual - Table of Contents

It’s worth going through the basics of the manual before continuing!

 

By default, all that you will see in the GUI on a clean install of Nagios is a localhost pointing at the linux VM itself. In order to configure it to your system, we need to get our hands dirty in the CFGs that control what gets displayed in the web gui. You can find these at /usr/local/nagios. There you’ll find nagios.cfg, the primary config file, and the objects folder, where the hosts themselves get defined.

For a longer explanation, see here: https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/objectdefinitions.html

 

For this demo, and because all I want is a dashboard that sends pings, not much of anything else, I’m going to be using the localhost.cfg file inside of the objects folder, and just put my entire system in that in that. Why did I chose the localhost.cfg file? Only because by default, that’s the one CFG that isn’t commented out in the nagios.cfg file, the primary config. There are several .cfgs in that folder which can be used to further subdivide the system, so that you can have specific services run for specific devices. If all you’re looking for is some categories of hosts and just a couple of services, like ping, check html etc, that will get you all the way there. This is using a fraction of a fraction of the functionality of Nagios, and I’d highly recommend reading through the documentation linked above to see more.

The File Permissions Issue

 

If you’ve never dealt with linux you might not be familiar with groups and permissions issues. By default, if you made a user when you installed ubuntu and then followed the guide, you will not have the ability to edit these .cfg files.



Here’s a decent explanation of the chown command and what to do with it. I ‘777’d my objects folder and my nagios.cfg file (making it so any user can make changes), so that I could make changes easily. Please be careful if this is something on an unsecure network, and take reasonable security precautions when messing around with file permissions.

 

Now that we have that out of the way….



below is my localhost.cfg file for this demo:

############################################################################### # # HOST DEFINITION # ############################################################################### # Define a host for the local machine define host { use linux-server ; Name of host template to use ; This host definition will inherit all variables that are defined ; in (or inherited by) the linux-server host template definition. host_name ecl alias Eclipse CPU address 10.50.1.11 } define host { use linux-server ; Name of host template to use ; This host definition will inherit all variables that are defined ; in (or inherited by) the linux-server host template definition. host_name eque alias ECLIPSE E-QUE address 10.50.1.21 } define host { use linux-server ; Name of host template to use ; This host definition will inherit all variables that are defined ; in (or inherited by) the linux-server host template definition. host_name ivc alias Eclipse IVC address 10.50.1.22 } define host { use linux-server ; Name of host template to use ; This host definition will inherit all variables that are defined ; in (or inherited by) the linux-server host template definition. host_name ipa alias Eclipse IPA address 10.50.1.23 } define host { use linux-server ; Name of host template to use ; This host definition will inherit all variables that are defined ; in (or inherited by) the linux-server host template definition. host_name lq1 alias LQ 1 - SIP address 10.50.1.31 } define host { use linux-server ; Name of host template to use ; This host definition will inherit all variables that are defined ; in (or inherited by) the linux-server host template definition. host_name lq2 alias LQ2 - Gen-IC address 10.50.1.32 } ############################################################################### # # HOST GROUP DEFINITION # ############################################################################### # Define an optional hostgroup for Linux machines define hostgroup { hostgroup_name cc-fr ; The name of the hostgroup alias CLEAR COM FRAME ; Long name of the group members ipa,ecl,eque,ivc, ; Comma separated list of hosts that belong to this group } define hostgroup { hostgroup_name cc-lq ; The name of the hostgroup alias CLEAR COM LQs ; Long name of the group members lq1,lq2, ; Comma separated list of hosts that belong to this group } ############################################################################### # # SERVICE DEFINITIONS # ############################################################################### # Define a service to "ping" the local machine define service { use local-service ; Name of service template to use host_name ipa,ecl,eque,ivc,lq1,lq2 service_description PING check_command check_ping!100.0,20%!500.0,60% } # Define a service to check HTTP on the local machine. # Disable notifications for this service by default, as not all users may have HTTP enabled. define service { use local-service ; Name of service template to use host_name ipa,lq1, service_description HTTP check_command check_http notifications_enabled 0 }

 

This code is pretty human-readable, so I won’t go too deep, but what I have done here is define hosts with their IPs, in this case my Eclipse cards and my LQs. This could be expanded just by copy and pasting the syntax and changing the name and IP. The Host Group definition defines categories that will display in the web gui as you see above. The Service Definition defines which functions we want to do to our hosts, and which hosts in this file we want to do which function to. The standard use for this .cfg file is for linux servers, so if you installed this from scratch, you’ll have way more services in here than just check http and ping. I just got rid of all the ones I didn’t need or want. These services reference plugins in the /usr/local/nagios/libexec folder. The quickstart guide has you install the basic 65 plugins, but there are many many more available if you are looking to do something in particular. I’m using the check ping and html services. The only difference between the two is that the check_html service looks to see if it can get to port 80.

 

Every plugin, including something simple like check_ping, has settings that can be changed by adding lines to the check command.



You can see above, the only configuration for the plugin here are which values cause ‘warning’ and ‘critical’ messages in the web gui, but there are many others that could be changed if I wanted to.

For this demo, all I’ve done is enter the IPs of all of my devices, name them as hosts, and add them to host groups as you can see. If you take my ubuntu image above, this should be all you need to change. Make sure to restart the nagios service after you make your changes to any .cfg in order to cause it to update. You can do that from the web gui or with the line in terminal:

 

sudo systemctl restart nagios.service

 

My choice to use the localhost.cfg is more or less arbitrary, it just happens that when nagios is installed most of the cfgs are commented out in the nagios.cfg file (meaning they won’t display), the “main” configuration. I have allowed localhost and switch cfgs into my setup, and you can see that here, in a portion of the nagios.cfg file:


# OBJECT CONFIGURATION FILE(S) # These are the object configuration files in which you define hosts, # host groups, contacts, contact groups, services, etc. # You can split your object definitions across several config files # if you wish (as shown below), or keep them all in a single config file. # You can specify individual object config files as shown below: cfg_file=/usr/local/nagios/etc/objects/commands.cfg cfg_file=/usr/local/nagios/etc/objects/contacts.cfg cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg cfg_file=/usr/local/nagios/etc/objects/templates.cfg # Definitions for monitoring the local (Linux) host cfg_file=/usr/local/nagios/etc/objects/localhost.cfg # Definitions for monitoring a Windows machine #cfg_file=/usr/local/nagios/etc/objects/windows.cfg # Definitions for monitoring a router/switch cfg_file=/usr/local/nagios/etc/objects/switch.cfg # Definitions for monitoring a network printer #cfg_file=/usr/local/nagios/etc/objects/printer.cfg

As you can see, anything that is commented out with a # will not appear. The only .cfg that are in the live config are switch.cfg and localhost.cfg.

Switch Monitoring via SNMP


So far, I have showed the localhost.cfg file, and how I made the host groups ‘cc-fr’ and ‘cc-lq’. The last group you can see in the screenshot above of my finished demo is the switch category, which comes from that switch.cfg file.

 

 

As you can see, configuration of the switch monitoring side is very similar to the cfg before, the only difference is that we’ll be capturing SNMP data from the switch. Just like before, you’ll need to enter the IP of the switch, and define what services run based on the hostname defined at the top. The “check snmp” plugin is where things get interesting. Everything that produces SNMP has a tree that we’ll need values out of. Each variable, called a MIB, will have a value, called an OID. The easiest way to find, in the case above, what the OID value of link status is, would be to use a MIB browser. Any will work, but here is the one I used:

nagios2.JPG



In order for this to work, SNMP needs to be enabled at the switch. In my case, on a Netgear M4250, I had to enable reading SNMP values from ‘0.0.0.0’, which allows any host on the switch to get SNMP messages.


All I’ve set up in this config is link status for every interface on my switch, and uptime. Link status (or ifOperStatus, as netgear calls it) is a MIB that has two possible values, 1 for up, 2 for down. As you’ve probably already guessed, that long string of numbers next to check_command in the switch.cfg represents where in the SNMP tree that MIB is (one for each interface). In my case, using a Netgear M4250, 1.3.6.1.2.1.2.2.1.8 is the OID for getting to the link status “folder”, and then each link is represented by the last integer. The “-r 1” after that MIB tells Nagios that the value 1, which we know means up, is the “good” value, and any other value returned will be “critical” in nagios as such:


You can get more complicated with SNMP values, to see more, read here:

 

There is one service in there you may have noticed doesn’t have a -r 1 value after the MIB, and that is for Uptime. That’s because there is no “bad value” for it where it would need to show as “critical”, I just want to know how long the switch has been up. I would encourage you to use the MIB browser to look at what information your switch can provide via SNMP. Anything in that tree can be put into your nagios config, and just like in the configuration for the other hosts above, you can track multiple switches just by adding another in the host definition field, and then adding a second hostname on every service line.

Using the GUI

 

Once you’ve gotten it set up, using the gui is pretty easy, and there are youtube tutorials for almost every page.


https://www.youtube.com/watch?v=jUDrjgEDb2A&t=1s



My go-to page for looking at the system at a glance is the host groups page, screenshotted above. This categorizes everything based on host groups and give you a summary of alerts on each item you have.

I would also point you to the host alert history page, which you can get to by clicking on a host and then “View Alert History for this Host”. You can see logs for as long as your nagios vm is online, and sort by type. This can be a powerful tool in your pocket for troubleshooting intermittent network issues.



In Conclusion, this is a very complex application which, particularly if you’re not familiar with Linux, can be intimidating. But take it from me, and I’m not a linux guy at all: once you figure out how to configure the system, it will be an extremely powerful tool to troubleshoot and monitor your network, for Clear-Com gear, and anything else.

 

 

 

 



 




CAN'T FIND YOUR ANSWER? CLICK HERE TO CONTACT SUPPORT


This solution was provided to you by Clear-Com via a question submitted to us by customers like you. If your question wasn’t answered, you need help or you have a recommended solution for our database, please send us an email at support@clearcom.com

The information on this page is owned by Clear-Com and constitutes Clear-Com’s confidential and proprietary information, may be used solely for purposes related to the furtherance of Clear-Com’ business and shall not be disclosed, distributed, copied or disseminated without Clear-Com’s prior written consent. Click Here for Clear-Com's privacy statement.