Performing actions on host-up-down using nagios
From Notes_Wiki
<yambe:breadcrumb>Nagios_configuration|Nagios configuration</yambe:breadcrumb>
Performing actions on host-up-down using nagios
To perform actions when a host goes up or down using nagios use following steps:
- yum -y install nagios* --skip-broken
- edit /etc/nagios/objects/contacts.cfg and write correct email ID
- edit /etc/nagios/objects/commands.cfg and change check_host_alive command definition to:
- define command{
- command_name check-host-alive
- # command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
- command_line $USER1$/check_host_alive.sh $HOSTADDRESS$
- }
- edit /etc/nagios/objects/templates.cfg and after linux-server definition add following template:
- define host{
- name isp; The name of this host template
- use linux-server ; This template inherits other values from the generic-host template
- check_period 24x7 ; By default, Linux hosts are checked round the clock
- check_interval 1 ; Actively check the host every 5 minutes
- retry_interval 1 ; Schedule host check retries at 1 minute intervals
- max_check_attempts 3 ; Check each Linux host 10 times (max)
- check_command check-host-alive ; Default command to check Linux hosts
- notification_period 24x7 ; Check ISPs 24x7
- notification_interval 10 ; Resend notifications every 2 hours
- notification_options d,u,r,f,s ; Only send notifications for specific host states
- contact_groups admins ; Notifications get sent to the admins by default
- register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
- }
- edit /etc/nagios/nagios.cfg and
- comment cfg_file=/etc/nagios/objects/localhost.cfg
- uncomment cfg_dir=/etc/nagios/servers
- create a host and service definition for monitoring by creating file /etc/nagios/servers/test.cfg with following contents:
- define host{
- use isp ; Name of template
- host_name 192.168.122.102 ; Short name of server, will be used in nagios configuration
- alias 192.168.122.102 ; Long name of server, will be used in reporting
- address 192.168.122.102 ; FQDN or IP address of host, will be used for checks
- }
- define service{
- use generic-service ; Name of service template to use
- host_name 192.168.122.102
- service_description PING
- check_command check_ping!100.0,20%!500.0,60%
- notifications_enabled 1
- }
- create /usr/lib64/nagios/plugins/check_host_alive.sh with following contents:
- #!/bin/bash
- OUTPUT2=$(ping -w 5 $1 -q)
- STATUS2=$?
- #Do not use internal plugin. It hangs in few cases. Ping used above is more reliable
- #OUTPUT=$(/usr/lib64/nagios/plugins/check_ping -H $1 -w 3000.0,80% -c 5000.0,100% -p 5 -t 5)
- #STATUS=$?
- if [[( "$1" == "14.139.5.5" ) && ( "$STATUS2" != "0" ) ]]; then
- GATEWAY=$(/sbin/ip route | awk '/default/ { print $3 }')
- echo $(date) "NKN is down, Gateway is $GATEWAY" >> /var/log/nagios/nkn-link-status.txt
- /usr/bin/ssh root@proxy1.sbarjatiya.com "echo $(date) NKN is down, should shift to BEAM if required >> /root/nkn-link-status.txt" 2>&1 >> /tmp/nagios.txt
- #Shift from NKN to BEAM if current gateway is not beam gateway
- if [[ "$GATEWAY" != "183.82.96.1" ]]; then
- sudo /sbin/route -v del default gw 10.4.8.2
- sudo /sbin/route -v add default gw 183.82.96.1
- #Not changing DNS from 10.4.20.204 to BEAM in hope that the same DNS might work
- fi
- elif [[( "$1" == "14.139.5.5" ) && ( "$STATUS2" == "0" ) ]]; then
- GATEWAY=$(/sbin/ip route | awk '/default/ { print $3 }')
- echo $(date) "NKN is up, Gateway is $GATEWAY" >> /var/log/nagios/nkn-link-status.txt
- /usr/bin/ssh root@proxy1.sbarjatiya.com "echo $(date) NKN is up, should shift to NKN if required. >> /root/nkn-link-status.txt" 2>&1 >> /tmp/nagios.txt
- #Shift from BEAM to NKN, if current gateway is not NKN gateway
- if [[ "$GATEWAY" != "10.4.8.2" ]]; then
- sudo /sbin/route -v del default gw 183.82.96.1 2>&1 >>/tmp/nagios.txt
- sudo /sbin/route -v add default gw 10.4.8.2 2>&1 >>/tmp/nagios.txt
- #Not changing DNS back to 10.4.20.204, as the DNS are not changed while migrating to BEAM
- fi
- elif [[( "$1" == "183.82.96.1" ) && ( "$STATUS2" == "0" ) ]]; then
- echo $(date) "BEAM is up" >> /var/log/nagios/beam-link-status.txt
- /usr/bin/ssh root@proxy1.sbarjatiya.com "echo $(date) BEAM is up, we can use it if NKN is down. >> /root/beam-link-status.txt" 2>&1 >> /tmp/nagios.txt
- elif [[( "$1" == "183.82.96.1" ) && ( "$STATUS2" != "0" ) ]]; then
- echo $(date) "BEAM is down" >> /var/log/nagios/beam-link-status.txt
- /usr/bin/ssh root@proxy1.sbarjatiya.com "echo $(date) BEAM is down >> /root/beam-link-status.txt" 2>&1 >> /tmp/nagios.txt
- else
- echo "Else case" >> /tmp/nagios.txt
- echo "1 is $1" >> /tmp/nagios.txt
- echo "STATUS is $STATUS" >> /tmp/nagios.txt
- fi
- echo $OUTPUT2
- if [[ "$STATUS2" == "0" ]]; then
- exit 0
- else
- exit 2
- fi
- chmod +x /usr/lib64/nagios/plugins/check_host_alive.sh
- usermod -s /bin/bash nagios
- su - nagios
- ssh-keygen
- ssh-copy-id root@192.168.122.103 #root@proxy1.sbarjatiya.com
- ssh root@192.168.122.103
- exit
- visudo
- Give nagios full root access or at least access to run /sbin/route
- Disable Defaults requiretty option by commenting it
- nagios -v /etc/nagios/nagios.cfg
- chkconfig nagios on
- service nagios start
- htpasswd -c /etc/nagios/passwd nagios
- chown -R nagios:apache /etc/nagios
- chmod -R 755 /etc/nagios
- chmod -R 777 /var/log/nagios
- chown -R nagios:nagios /var/log/nagios
- chmod 700 /var/log/nagios/.ssh/
- chmod 600 /var/log/nagios/.ssh/*
- chown -R nagios:apache /var/spool/nagios/
- chmod -R 775 /var/spool/nagios/
- Look at /tmp/nagios.txt and verify works lines are getting appended
- Look at status.txt on 192.168.122.103 and verify that /root/status.txt is as per status of 192.168.122.102 host
- Verify emails are being received at admin email ID.
<yambe:breadcrumb>Nagios_configuration|Nagios configuration</yambe:breadcrumb>