Wednesday, 1 June 2016

Install and configure GANGLIA ON CENTOS/RHEL

INSTALL AND CONFIGURE GANGLIA ON CENTOS/RHEL

This article will guide you through the installation and configuration steps for Ganglia-3.6.0 on CentOS/RHEL 6.3
1) About
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.
Ganglia has following three main components:
1) Ganglia Monitoring Daemon (gmond) 
It is a lightweight service that is installed on every machine you’d like to monitor.
Gmond has four main responsibilities:
  • Monitor changes in host state.
  • Announce relevant changes.
  • Listen to the state of all other ganglia nodes via a unicast or multicast channel.
  • Answer requests for an XML description of the cluster state.
Each gmond transmits in information in two different ways:
  • Unicasting or Multicasting host state in external data representation (XDR) format using UDP messages.
  • Sending XML over a TCP connection.
2) Ganglia Meta Daemon (gmetad)
The ganglia meta daemon (gmetad) is a service that collects data from other gmetad and gmond sources and stores their state to disk in indexed round-robin databases. Gmetad provides a simple query mechanism for collecting specific information about groups of machines.
3) Ganglia PHP Web Front-end 
The Ganglia web front-end provides a view of the gathered information via real-time dynamic web pages. Most importantly, it displays Ganglia data in a meaningful way for system administrators and computer users using PHP.
Other component are:
4) Gmetrics
The ganglia metric tool is a commandline application that you can use to inject custom made metrics about hosts that are being monitored by ganglia. It has the ability to spoof messages as coming from a different host in case you want to capture and report metrics from a device where you don’t have gmond running (like a network or other embedded device).
4) Gstat
The ganglia stat tool is a commandline application that you can use to query a gmond for metric information directly.
1) Prerequisite
  • Following dependent package needs to be installed first, before installing Ganglia.
       yum -y install apr-devel apr-util check-devel cairo-devel pango-devel libxml2-devel rpmbuild glib2-devel \
       dbus-devel freetype-devel fontconfig-devel gcc-c++ expat-devel python-devel libXrender-devel 
  • Other dependent packages that need to be Installed are rrdtool and confuse that are covered in installation steps.
2) Installation
2.1) Create user ganglia by which we will be running ganglia services:
   useradd ganglia
   password ganglia


2.2) Create a directory where you would download ganglia source:
   mkdir /usr/local/src
   cd /usr/local/src


2.3) First of all we need to Download/Untar/Compile/Install all necessary files for confuse:
   wget http://savannah.nongnu.org/download/confuse/confuse-2.7.tar.gz
   tar -xzvf confuse-2.7.tar.gz
   cd confuse-2.7
   ./configure
   make
   make install

2.4) Secondly we need to Download/Untar/Compile/Install all necessary files for rrdtool:
   cd /usr/local/src
   wget http://oss.oetiker.ch/rrdtool/pub/rrdtool.tar.gz
   tar -xzvf rrdtool.tar.gz
   cd rrdtool-1.4.8/
   ./configure --prefix=/usr
   make -j8
   make install
   which rrdtool

2.5) Make sure all the new installed package have libraries linked using *ldconfig* command:
   vi /etc/ld.so.conf
      /usr/local/lib
   Then execute the following command:
   ldconfig


2.6) Now all the dependency are installed now is the time to Download/Untar/Compile/Install all necessary files for ganglia-core package:
   cd /usr/local/src
   wget http://sourceforge.net/projects/ganglia/files/ganglia%20monitoring%20core/3.6.0/ganglia-3.6.0.tar.gz
   tar -xzvf ganglia-3.6.0.tar.gz
   ./configure --with-gmetad
   make -j8
   make install


NOTE: You should exit without errors. If you see errors, then you may want to check for missing libraries.
3) Configuring Ganglia
3.1) Create the config directory for Ganglia:
   mkdir /etc/ganglia

3.2) Copy the sample gmetad configuration file: 
   cp gmetad/gmetad.conf /etc/ganglia/

3.3) Generate the initial gmond configuration file: 
   gmond -t | tee /etc/ganglia/gmond.conf

3.4) Copy the initial startup script, change the binary and config path and enable it on boot: 
   cp gmetad/gmetad.init /etc/rc.d/init.d/gmetad
   cp gmond/gmond.init /etc/rc.d/init.d/gmond
   vi /etc/init.d/gmetad

       #GMETAD=/usr/sbin/gmetad
       GMETAD=/usr/local/sbin/gmetad
       #daemon $GMETAD
       daemon $GMETAD -c /etc/ganglia/gmetad.conf

   vi /etc/init.d/gmond

       #GMOND=/usr/sbin/gmond
       GMOND=/usr/local/sbin/gmond
       # daemon $GMOND
       daemon $GMOND -c /etc/ganglia/gmond.conf

   chkconfig --add gmetad
   chkconfig --add gmond

3.4) Now create a storage directory for RRDTool and make sure rrdtool can write to it: 
   mkdir -p /var/lib/ganglia/rrds
   chown ganglia:ganglia /var/lib/ganglia/rrds

3.5) Modify the following parameter in gmetad config file: 
   vi /etc/ganglia/gmetad.conf
      data_source "Ganglia Test Setup" FQDN Name of Ganglia Server
      setuid_username "ganglia"
      case_sensitive_hostnames 0


Note: Replace “FQDN Name of Ganglia Server” with your DNS server name.
3.6) Lastly modify the following parameter in gmond config file:
   vi /etc/ganglia/gmond.conf
   user = ganglia
   cluster {
      name = "Ganglia Test Setup"
      owner = "Ops"
      latlong = "unspecified"
      url = "unspecified"
    }

   udp_send_channel {
      host = FQDN Name of Ganglia Server
      port = 8649
      ttl = 1
    }

udp_recv_channel {
      port = 8649
    }

tcp_accept_channel {
      port = 8649
}



Note: *) Replace “FQDN Name of Ganglia Server” with your DNS server name.
*) By default gmond will use reverse DNS resolution when displaying hostname, to override this value use“override_hostname” config parameter.
*) Gmond will use IP Address in case DNS resolution and “override_hostname” is not set.
4) Validation and Testing
4.1) First try to run gmetad daemon in debug mode and see everything is fine on one of the terminal window:
   gmond -d 5 -c /etc/ganglia/gmond.conf

   Going to run as user nobody
   Sources are ...
   Source: [Ganglia Test Setup, step 15] has 1 sources
    127.0.0.1
   xml listening on port 8651
   interactive xml listening on port 8652
   Data thread 140442277627648 is monitoring [Ganglia Test Setup] data source
       127.0.0.1
   cleanup thread has been started
   data_thread() for [Ganglia Test Setup] failed to contact node 127.0.0.1
   data_thread() got no answer from any [Ganglia Test Setup] datasource

4.2) First try to run gmond daemon in debug mode and see everything is fine on second terminal window:
   gmond -d 5 -c /etc/ganglia/gmond.conf

   saving metadata for metric: disk_free host: localhost
   Processing a metric value message from localhost
   ***Allocating value packet for host--server001.gauri.com-- and metric --disk_free-- ****

   Processing a metric metadata message from localhost
   ***Allocating metadata packet for host--localhost-- and metric --part_max_used-- ****

4.3) Open another terminal window and see rrd file are created or not:
   ls -lh /var/lib/ganglia/rrds/
   total 8.0K
   drwxr-xr-x. 4 ganglia ganglia 4.0K Aug 16 16:28 Ganglia Test Setup
   drwxr-xr-x. 2 ganglia ganglia 4.0K Aug 16 16:28 __SummaryInfo__

4.4) Once you are convinced that everything is fine, stop these process in daemon mode (by pressing CTRL + C) and start there individual service that we have created earlier:
   service gmetad start
   service gmond start
 

4.5) Verify process are running and respective ports are opened:
   ps -ef | grep -v grep | grep gm
   ganglia   6226     1  0 16:59 ?        00:00:00 /usr/local/sbin/gmetad -c /etc/ganglia/gmetad.conf
   ganglia   6267     1  0 17:01 ?        00:00:00 /usr/local/sbin/gmond -c /etc/ganglia/gmond.conf

   netstat -plane | egrep 'gmon|gme'
   tcp        0      0 0.0.0.0:8649                0.0.0.0:*                   LISTEN      502        1067310    6267/gmond          
   tcp        0      0 0.0.0.0:8651                0.0.0.0:*                   LISTEN      502        1047072    6226/gmetad         
   tcp        0      0 0.0.0.0:8652                0.0.0.0:*                   LISTEN      502        1047073    6226/gmetad         
   udp        0      0 0.0.0.0:8649                0.0.0.0:*                               502        1067309    6267/gmond 
 

5) Deploying Ganglia Web
5.1) Download the package and Untar it:
   cd /usr/local/src/
   wget http://sourceforge.net/projects/ganglia/files/ganglia-web/3.5.10/ganglia-web-3.5.10.tar.gz
   tar -xzvf ganglia-web-3.5.10.tar.gz
   cd ganglia-web-3.5.10


5.2) Modify the Makefile Config that will be used to deploy ganglia web:
   vi Makefile
      # Location where gweb should be installed to (excluding conf, dwoo dirs).
      GDESTDIR = /var/www/html/ganglia

      # Gweb statedir (where conf dir and Dwoo templates dir are stored)
      GWEB_STATEDIR = /var/lib/ganglia-web

      # Gmetad rootdir (parent location of rrd folder)
      GMETAD_ROOTDIR = /var/lib/ganglia

      # User by which your webserver is running
      APACHE_USER =  apache

5.3) Now install Ganglia Web, once we have done the config changes:
   make install

5.4) Try to open Ganglia Web UI in your favourite Web Browser:
   http://localhost/ganglia
   OR  
   http://Server-IP-Address/ganglia


NOTE : In case of any issue try to disabling iptables and selinux as described in next Section and check.
6) Security Rules
6.1) Firewall Rule for Ganglia
6.1.1) Temporary disabling IPTables rules:
   service iptables stop


6.1.2) Firewall port (8649) that needs to be open for Ganglia daemon:
   iptables -A INPUT -p udp -m udp –dport 8649 -j ACCEPT


6.1.3) Firewall port (80) that needs to be open for Ganglia Web:
   iptables -A INPUT -p tcp -m tcp –dport 80 -j ACCEPT


6.1.4) Save the Iptables rules and restart it:
   service iptables save
   service iptables restart


6.2) SELinux Rule for Ganglia
6.2.1) Temporary disabling SELinux rules:
   echo 0 >/selinux/enforce 

Once we are fully convinced that Ganglia Server is running successfully, now is time to do nodes (i.e. Client) Setup.
7) Preparing Client package for ganglia
Build up the Ganglia Client Package that will be deployed on client machine:
   tar -czvf /tmp/ganglia-client.tar.gz /usr/local/sbin/gmond /etc/ganglia/gmond.conf /etc/init.d/gmond \
   /usr/local/lib64/libganglia-3.6.0.so.0* /lib64/libexpat.so.1* /usr/local/lib/libconfuse.so* \
   /usr/lib64/libapr-1.so* /usr/local/lib64/ganglia

8) Deploying Client package
8.1) SCP the tar package on one of the Client box from Ganglia Server:
   scp /tmp/ganglia-client.tar.gz root@CLIENT-MACHINE-IP-OR-NAME:/tmp/

8.2) Untar the package using following command:
   tar -C / -xzvf /tmp/ganglia-client.tar.gz

8.3) Create user ganglia as well and start the gmond service:
   useradd ganglia
   service gmond start

8.4) Verify the gmond process is running:
   ps -ef | grep -v grep | grep gmond

8.5) Also check the rrd file is created for this machine on the Ganglia Server or checking Ganglia Web UI:
   ls -lh /var/lib/ganglia/rrds/Ganglia\ Test\ Setup/
   total 8.0K
   drwxr-xr-x. 2 ganglia ganglia 4.0K Aug 16 16:28 localhost
   drwxr-xr-x. 2 ganglia ganglia 4.0K Aug 16 16:28 web001
   drwxr-xr-x. 2 ganglia ganglia 4.0K Aug 16 16:28 __SummaryInfo__

8.6) Add the other nodes (Client) to Ganglia Server by repeating the steps mentioned from 8.1 to 8.5:
Congratulations! You have successfully deployed Ganglia Setup, grap a glass of bear and enjoy exploring it.

Add new node to existing Platform LSF Cluster

Add new node to existing Platform LSF Cluster

See if the new node is of the same type (eg. linux2.6-glibc2.3-x86_64) as those existing in the cluster.
If it does, a sub directory named (eg. linux2.6-glibc2.3-x86_64) with its type will exist in LSF_TOP/7.0
1. Log on to the master host as root,
2. add the hosts to lsf.cluster.cluster_name (file can be fount at LSF_TOP/conf). If they are servers, specify 1, otherwise specify 0. You can use !  for model and type for automatic detection.
3. run badmin mbdrestart
4. On the new host, create “lsfadmin” user & run ./hostsetup –top=”/usr/share/lsf” –boot=”y” found under LSF_TOP/7.0/install as root user.
5. Start LSF on the new host with: lsadmin limstartup, lsadmin resstartup and badmin hstartup
If the new host is different architecture
1. Get the LSF distribution file for the new type (eg aix5-64) from the download section in my.platform.com (Need valid user account to login)
2. Edit install.config in LSF_TOP/7.0/install
3. Change the following parameters:
a. For LSF_TARDIR, specify the path to the tar file. For example:LSF_TARDIR=”/usr/share/lsf_distrib/7.0″
b. For LSF_ADD_SERVERS, list the new host names enclosed in quotes and separated by spaces.
For example:LSF_ADD_SERVERS=”hosta hostb”
4. Run ./lsfinstall -f install.config. This automatically creates the host information in lsf.cluster.cluster_name.
5. Run lsadmin reconfig and badmin reconfig