Monday, August 11, 2014

Redhat Cluster - Command Line Tools


Ricci service must be started (ccsd has been replaced with ricci)

/etc/cluster/cluster.conf                     main configuration file, use /usr/share/cluster/cluster.rng for comprehensive list and description of cluster.conf elements and attributes

Create new cluster configuration, create cluster configuration file on local system

Ccs --f file [options]

Once file created, send it to nodes in cluster nodes

Ccs -h dst_host --f file --setconf          ( this will create cluster.conf in /etc/cluster dir on destination host) (ricci must be installed and running)

To distribute file to all nodes in cluster with same password

Ccs –h host –p password --sync --activate

Commands that Overwrite Previous Settings
o    --settotem
o    --setdlm
o    --setrm
o    --setcman
o    --setmulticast
o    --setaltmulticast
o    --setfencedaemon
o    --setlogging
o    –setquorumd

For example, to reset all of the fence deamon properties, you can run the following command.

ccs -h hostname --setfencedaemon

ccs -h hostname --setfencedaemon post_fail_delay=5 post_join_delay=10

Create a cluster

Ccs –h node1.example.com --createcluster mycluster                   # The cluster name cannot exceed 15 characters
         (where configuration file will be updated)

Add a node in cluster

Ccs –h node1.example.com --addnode node1.example.com

We can also assign node-id (default an uniq id will be assigned to each cluster node)

Ccs –h node1.example.com --addnode node1.example.com –-nodeid nodeid

If using quorum, following will be used to specify number of votes contributed by node and determine whether there is a quorum

Ccs –h node1.example.com --addnode node1.example.com --votes votes

List number of nodes in a cluster

Ccs –h node1.example.com --lsnodes

Remove a node

Ccs –h node1.example.com --rmnode node2.example.com

Configuring fence device

Ccs –h host --setfencedaemon post_fail_delay=value post_join_delay=value

Adding fence device

Ccs –h host –-addfencedev myfencedevicename agent=fence_apc ipaddr=apc_IP login=username passwd=password

Remove fence device

Ccs –h host –-rmfencedev mydevicename

List fence devices and fence device options

Ccs –h host -–lsfenceopts

Ccs –h host –-lsfenceopts fence_type (fence_type can be anything from output from above command)

Configuring Fence Devices Add fence method

1) Add a fence method for the node, providing a name for the fence method

ccs -h host --addmethod method node

For e.g.;
ccs -h node01.example.com --addmethod APC node01.example.com

2) Add a fence instance for the method. You must specify the fence device to use for the node, the node this instance applies to, the name of the method, and any options for this method that are specific to this node:

ccs -h host --addfenceinst fencedevicename node method [options]

For e.g;
ccs -h node01.example.com --addfenceinst myfencedevicename node01.example.com APC port=1

We will need to add a fence method for each node in the cluster

Remove fence method and instances

ccs -h host --rmmethod method node

e.g;

ccs –h node01.example.com –-rmmethod APC node01.example.com

Remove all instances of a fence device from a fence method

ccs -h host --rmfenceinst fencedevicename node method

e.g;

ccs –h node01.example.com –-rmfenceinst myfencedevicename node01.example.com APC


Configuring failover domain

By default, failover domains are unrestricted and unordered

Add a failover domain

Ccs –h host –-addfailoverdomain [restricted/unrestricted] [ordered/unordered] [nofailoback]

e.g;
ccs -h node-01.example.com --addfailoverdomain example_pri ordered

Adding a node to failover domain

ccs -h host --addfailoverdomainnode failoverdomain node priority

e.g.;

ccs -h node-01.example.com --addfailoverdomainnode example_pri node-01.example.com 1
ccs -h node-01.example.com --addfailoverdomainnode example_pri node-02.example.com 2
ccs -h node-01.example.com --addfailoverdomainnode example_pri node-03.example.com 3

List failover domain list

Ccs –h host –-lsfailoverdomain

Remove failover domain

Ccs –h host –-rmfailoverdomain example_pri

Remove a node from failover domain

Ccs –h host –-rmfailoverdomainnode example_pri node01.example.com

Configuring Global Cluster Resources

There are two types of resources:
o    Global — Resources that are available to any service in the cluster.
o    Service-specific — Resources that are available to only one service.

List currently configured resources and services in a cluster

Ccs –h host –-lsservices

Add a cluster resource

Ccs –h host –-addresource resourcetype [resource options]

For example, the following command adds a global file system resource to the cluster configuration file on node01.example.com. The name of the resource is web_fs, the file system device is /dev/sdd2, the file system mountpoint is /var/www, and the file system type is ext3.
ccs -h node01.example.com --addresource fs name=web_fs device=/dev/sdd2 mountpoint=/var/www fstype=ext3

Remove a cluster resource

Ccs –h host –-rmresource resourcetype [resource options]

If you need to modify the parameters of an existing global resource, you can remove the resource and configure it again

Add a cluster service

ccs -h host --addservice servicename [service options]

For example, to add a service to the configuration file on the cluster node node-01.example.com named example_apache that uses the failover domain example_pri, and that has recovery policy of relocate, execute the following command:
ccs -h node-01.example.com --addservice example_apache domain=example_pri recovery=relocate

Add a resource to the service

ccs -h host --addsubservice servicename subservice [service options]

ccs -h node01.example.com --addsubservice example_apache fs ref=web_fs

To add a service-specific resource to the service, you need to specify all of the service options. For example, if you had not previously defined web_fs as a global service, you could add it as a service-specific resource with the following command:
ccs -h node01.example.com --addsubservice example_apache fs name=web_fs device=/dev/sdd2 mountpoint=/var/www fstype=ext3

To add a service-specific resource to the service, you need to specify all of the service options. For example, if you had not previously defined web_fs as a global service, you could add it as a service-specific resource with the following command:
ccs -h node01.example.com --addsubservice example_apache fs name=web_fs device=/dev/sdd2 mountpoint=/var/www fstype=ext3

Remove service and it’s subservices

ccs -h host --rmservice servicename

To remove a subservice, execute the following command:

ccs -h host --rmsubservice servicename subservice [service options]

List available cluster services and resources

Ccs –h host –-lsserviceopts
Ccs –h host –-lsresourceopts

Print list of options available for any service type

Ccs –h host –-lsserviceopts service_type

Configuring quorum disk

ccs -h host --setquorumd [quorumd options]

Configure heuristic for quorum disk

ccs -h host --addheuristic [heuristic options]

List quorum disk options and heuristics that are configured on a system

ccs -h host --lsquorum

To remove a heuristic specified by a heuristic option :

ccs -h host rmheuristic [heuristic options]

Miscellaneous Cluster Configuration

List miscellaneous cluster attributes

ccs -h host --lsmisc

Cluster configuration Version

ccs -h host --setversion n

ccs -h host --getversion

Multicast Configuration

ccs -h host --setmulticast multicastaddress

Note that this command resets all other properties that you can set with the --setmulticastoption to their default values

If you specify a multicast address, you should use the 239.192.x.x series (or FF15:: for IPv6) that cman uses. Otherwise, using a multicast address outside that range may cause unpredictable results. For example, using 224.0.0.x (which is "All hosts on the network") may not be routed correctly, or even routed at all by some hardware.
If you specify or modify a multicast address, you must restart the cluster for this to take effect.

To remove a multicast address from a configuration file, use the --setmulticast option of the ccs but do not specify a multicast address:
ccs -h host --setmulticast

Configuring two node Cluster

f you are configuring a two-node cluster, you can execute the following command to allow a single node to maintain quorum (for example, if one node fails):

ccs -h host --setcman two_node=1 expected_votes=1

When you use the ccs --setcman command to add, remove, or modify the two_nodeoption, you must restart the cluster for this change to take effect

Logging

You can enable debugging for all daemons in a cluster, or you can enable logging for specific cluster processing.
To enable debugging for all daemons, execute the following command. By default, logging is directed to the /var/log/cluster/daemon.log file.

ccs -h host --setlogging [logging options]

For example, the following command enables debugging for all daemons.

ccs -h node1.example.com --setlogging debug=on


To enable debugging for an individual cluster process, execute the following command. Per-daemon logging configuration overrides the global settings.
ccs -h host --addlogging [logging daemon options]

For example, the following commands enable debugging for the corosync and fenceddaemons.

ccs -h node1.example.com --addlogging name=corosync debug=on
ccs -h node1.example.com --addlogging name=fenced debug=on

To remove the log settings for individual daemons, use the following command.
ccs -h host --rmlogging name=clusterprocess

For example, the following command removes the daemon-specific log settings for the fenced daemaon
ccs -h host --rmlogging name=fenced

Note that when you have finished configuring all of the components of your cluster, you will need to sync the cluster configuration file to all of the nodes

Propagating/Syncing the configuration file to the cluster nodes

Use the following command to propagate and activate a cluster configuration file:

ccs -h host --sync --activate

or

cman_tool version –r                  # to propagate configuration file

To verify that all of the nodes specified in the hosts cluster configuration file have the identical cluster configuration file, execute the following command:

ccs -h host --checkconf

If you have created or edited a configuration file on a local node, use the following command to send that file to one of the nodes in the cluster:

ccs -f file -h host --setconf

To verify that all of the nodes specified in the local file have the identical cluster configuration file, execute the following command:

ccs -f file --checkconf

Managing Redhat Cluster HA Add on with CCS

Cause a node to leave a cluster

[You can use the ccs command to cause a node to leave a cluster by stopping cluster services on that node. Causing a node to leave a cluster does not remove the cluster configuration information from that node. Making a node leave a cluster prevents the node from automatically joining the cluster when it is rebooted]

ccs –h host --stop


Cause a node to join a cluster

ccs –h host --start


Starting and stopping and cluster

ccs –h host --stopall


ccs –h host --startall

Diagnosing and correcting problems in cluster

Verify cluster configuration on all cluster nodes

ccs –h host --checkconf

Verify cluster configuration on local cluster nodes
[verify that all of the nodes specified in the local file have identical cluster configuration files with the following command]

ccs –f file --checkconf


Validate the updated file against the cluster schema (cluster.rng) by running the ccs_config_validate command

ccs_config_validate


Cluster service start and stop order

Service cman start
Service clvmd start
Service gfs2 start
Service rgmanager start


Service rgmanager stop
Service gfs2 stop
Service clvmd stop
Service cman stop

Display cluster service status

clustat

Service status meaning:

Started
The service resources are configured and available on the cluster system that owns the service.
Recovering
The service is pending start on another node.
Disabled
The service has been disabled, and does not have an assigned owner. A disabled service is never restarted automatically by the cluster.
Stopped
In the stopped state, the service will be evaluated for starting after the next service or node transition. This is a temporary state. You may disable or enable the service from this state.
Failed
The service is presumed dead. A service is placed into this state whenever a resource's stop operation fails. After a service is placed into this state, you must verify that there are no resources allocated (mounted file systems, for example) prior to issuing a disablerequest. The only operation that can take place when a service has entered this state is disable.
Uninitialized
This state can appear in certain cases during startup and running clustat -f.


Verify as the nodes are functioning as members in the cluster (signifies as “M” in status (Sts) column)

cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    548   2010-09-28 10:52:21  node-01.example.com
   2   M    548   2010-09-28 10:52:21  node-02.example.com


Managing HA services with clusvcadm

You can manage HA services using the clusvcadm command. With it you can perform the following operations:
o    Enable and start a service.
o    Disable a service.
o    Stop a service.
o    Freeze a service
o    Unfreeze a service
o    Migrate a service (for virtual machine services only)
o    Relocate a service.
o    Restart a service.

Enable a service

clusvcadm –e service_name –m Target_Server_name # where to start service if required place “-m”

Disable a service

clusvcadm –d service_name

Relocate a service

clusvcadm –r service_name –m Target_Server_Name

Stop a service

clusvcadm –s service_name

Freeze a service

clusvcadm –z service_name

# Freeze a service on the node where it is currently running. This prevents status checks of the service as well as failover in the event the node fails or rgmanager is stopped. This can be used to suspend a service to allow maintenance of underlying resources

Using the freeze operation allows maintenance of parts of rgmanager services. For example, if you have a database and a web server in one rgmanager service, you may freeze the rgmanager service, stop the database, perform maintenance, restart the database, and unfreeze the service.
When a service is frozen, it behaves as follows:
o    Status checks are disabled.
o    Start operations are disabled.
o    Stop operations are disabled.

o    Failover will not occur (even if you power off the service owner).

Unfreeze a service

clusvcadm –U service_name

Migrate a virtual machine

clusvcadm –M service_name –m Target_server_name

Restart a service

clusvcadm –R service_name