Difference between revisions of "Check cluster status via systemd service"

From Notes_Wiki
(Created page with "Home > Suse > SAP setup and maintenance > Check cluster status via systemd service We can check cluster status via a systemd script using: '''Not tested in production''' # Setup outgoing email via postfix on the system so that email can be sent using mail command via CentOS 8.x postfix send email through relay or smarthost with smtp authentication # Create a systemd script '<tt>/etc/systemd/system/cluster_status_check.service</tt>' with: <s...")
 
m
Line 2: Line 2:


We can check cluster status via a systemd script using: '''Not tested in production'''
We can check cluster status via a systemd script using: '''Not tested in production'''
# Refer [[CentOS 8.x systemd or systemctl]] for help on using systemd or creating new systemd services
# Setup outgoing email via postfix on the system so that email can be sent using mail command via [[CentOS 8.x postfix send email through relay or smarthost with smtp authentication]]
# Setup outgoing email via postfix on the system so that email can be sent using mail command via [[CentOS 8.x postfix send email through relay or smarthost with smtp authentication]]
# Create a systemd script '<tt>/etc/systemd/system/cluster_status_check.service</tt>' with: <source type="shell">
# Create a systemd script '<tt>/etc/systemd/system/cluster_status_check.service</tt>' with: <source type="shell">

Revision as of 04:45, 8 September 2023

Home > Suse > SAP setup and maintenance > Check cluster status via systemd service

We can check cluster status via a systemd script using: Not tested in production

  1. Refer CentOS 8.x systemd or systemctl for help on using systemd or creating new systemd services
  2. Setup outgoing email via postfix on the system so that email can be sent using mail command via CentOS 8.x postfix send email through relay or smarthost with smtp authentication
  3. Create a systemd script '/etc/systemd/system/cluster_status_check.service' with:
    [Unit]
    Description=Check cluster status and send email if not healthy
    
    [Service]
    Type=oneshot
    ExecStart=/sbin/cluster_status_check_script.sh
    Environment="EMAIL_ADDRESS=your_email@example.com"
    Environment="HOSTNAME=$(hostname)"
    Environment="IP_ADDRESS=$(hostname -I | awk '{print $1}')"
    ExecStartPost=/bin/sh -c 'if [ $? -ne 0 ]; then echo "Cluster status check failed on $HOSTNAME ($IP_ADDRESS)." | mail -s "Cluster Alert" $EMAIL_ADDRESS; fi'
    Restart=on-failure
    
    [Timer]
    OnUnitActiveSec=1h
    Unit=cluster_status_check.service
    
    [Install]
    WantedBy=multi-user.target
    In the script replace EMAIL_ADDRESS appropriately
  4. As per ExecStart path given in systemd service create '/sbin/cluster_status_check_script.sh with
    # #!/bin/bash
    
    # Run crm status and store output in a variable
    crm_output=$(crm status)
    
    # Check for any errors or warnings (ignoring case)
    if [[ $crm_output =~ (error|warning) ]]; then
      echo "Error or warning found in cluster status"
      echo "$crm_output"
      exit 1
    fi
    
    # Check if all nodes are online
    num_nodes=$(crm_node -l | wc -l)
    num_online_nodes=$(crm_mon -1 | grep "Online:" | awk '{print $2}' | wc -w)
    
    if [[ $num_nodes -ne $num_online_nodes ]]; then
      echo "Not all nodes are online"
      echo "$crm_output"
      exit 1
    fi
    
    # Check if all resources are started properly
    num_resources=$(crm_mon -1 | grep -c "resource")
    
    if [[ $num_resources -eq 0 ]]; then
      echo "No resources found in the cluster"
      echo "$crm_output"
      exit 1
    fi
    
    num_started_resources=$(crm_mon -1 | grep "resource" | grep "Started" | wc -l)
    
    if [[ $num_resources -ne $num_started_resources ]]; then
      echo "Not all resources are started properly"
      echo "$crm_output"
      exit 1
    fi
    
    echo "Cluster status is OK"
    
    exit 0
  5. Set execute permissions on script and reload, enable, start service via:
    chmod +x /sbin/cluster_status_check_script.sh
    systemctl daemon-reload
    systemctl start cluster_status_check.service
    systemctl enable cluster_status_check.service
  6. If feasible stop a resource and validate whether email is received or not. You can consider adding a virtual IP resource for testing and remove this resource later.


Home > Suse > SAP setup and maintenance > Check cluster status via systemd service