Chapter 3. System Operation

Chapter 3. System Operation
Prev		Next

This chapter describes how to use the SMC for Altix ICE systems management software to operate your Altix ICE system and covers the following topics:

Software Image Management

This section describes image management operations.

This section describes Linux services turned off on compute nodes by default, how you can customize the software running on compute nodes or service nodes, create a simple clone image of compute node or service node software, how to use the cimage command, how to use crepo command to manage software image reposistories, and how to use the cinstallman command to create compute and service node images. It covers these topics:

Compute Node Services Turned Off by Default

To improve the performance of applications running MPI jobs on compute nodes, most services are disabled by default in compute node images. To see what adjustments are being made, view the /etc/opt/sgi/conf.d/80-compute-distro-services script.

If you wish to change anything in this script, SGI suggests that you copy the existing script to .local and adjust it there. Perform the following commands:

# cd /var/lib/systemimager/images/compute-image-name
# cp etc/opt/sgi/conf.d/80-compute-distro-services 80-compute-distro-services.local
# vi etc/opt/sgi/conf.d/80-compute-distro-services.local

At this point, the configuration framework will execute the .local version, and skip the other. For more information on making adjustments to configuration framework files, see “SGI Altix ICE System Configuration Framework”.

Use the cimage command to push the changed image out to the leader nodes.

`crepo` Command

You can use the crepo command to manage software repositories, such as, SGI Foundation, SMC for Altix ICE, SGI Performance Suite, and the Linux distribution(s) you are using on your system. You also use the crepo command to manage any custom repositories you create yourself.

The configure-cluster command calls the crepo command when it prompts you for media and then makes it available. You can also use the crepo command to add additional media.

Each repository has associated with it a name, directory, update URL, selection status, and suggested package lists. The update URL is used by the sync-repo-updates command. The directory is where the actual yum repository exists, and is located in one of these locations, as follows:

Repository
	Description
`/tftpboot/sgi/*`
	For SGI media
`/tftpboot/other/*`
	For any media that is not from SGI
`/tftpboot/distro/*`
	For Linux distribution repositories such as SLES or RHEL
`/tftpboot/x`
	Customer-supplied repositories

The repository information is determined from the media itself when adding media supplied by SGI, Linux distribution media (SLES, RHEL, and so on.), and any other YaST-compatible media. For customer-supplied repositories, the information must be provided to the crepo command when adding the repository.

Repositories can be selected and unselected. Usually, SMC for Altix ICE commands ignore unselected repositories. One notable exception is that sync-repo-updates always operates on all repositories.

The crepo command constructs default RPM lists based on the suggested package lists. The RPM lists can be used by the cinstallman command when creating a new image. These RPM lists are only generated if a single distribution is selected and can be found in /etc/opt/sgi/rpmlists; they match the form generated-*.rpmlist. The crepo command will tell you when it updates or removes generated rpmlists. For example:

# crepo --select SUSE-Linux-Enterprise-Server-10-SP3
Updating: /etc/opt/sgi/rpmlists/generated-compute-sles10sp3.rpmlist
Updating: /etc/opt/sgi/rpmlists/generated-service-sles10sp3.rpmlist

When generating the RPM lists, the crepo command combines the a list of distribution RPMs with suggested RPMs from every other selected repository. The distribution RPM lists are usually read from the /opt/sgi/share/rpmlists/distro directory. For example, the compute node RPM list for sles11sp1 is /opt/sgi/share/rpmlists/distro/compute-distro-sles11sp1.rpmlist. The suggested RPMs for non-distribution repositories are read from the /var/opt/sgi/sgi-repodata directory. For example, the rpmlist for SLES 11 SP1 compute nodes is read from /var/opt/sgi/sgi-repodata/SMC-for-ICE 1.0-for-Linux-sles11/smc-ice-compute.rpmlist.

The suggested rpmlists can be overridden by creating an override rpmlist in the /etc/opt/sgi/rpmlists/override/ directory. For example, to change the default SMC for ICE 1.0 suggested rpmlist, a file /etc/opt/sgi/rpmlists/override/SMC-for-ICE-1.0 -for-Linux-sles11/smc-ice-compute.rpmlist can be created.

The following example shows the contents of the /etc/opt/sgi/rpmlists directory after the crepo command has created the suggested RPM lists. Change directory (cd) to the /etc/opt/sgi/rpmlists directory. Use the ls command to see a list of rpms, as follows:

admin distro]# ls
compute-distro-centos5.4.rpmlist  lead-distro-sles11sp1.rpmlist
compute-distro-rhel5.4.rpmlist    service-distro-rhel5.4.rpmlist
compute-distro-rhel5.5.rpmlist    service-distro-rhel5.5.rpmlist
compute-distro-rhel6.0.rpmlist    service-distro-rhel6.0.rpmlist
compute-distro-sles10sp3.rpmlist  service-distro-sles10sp3.rpmlist
compute-distro-sles11sp1.rpmlist  service-distro-sles11sp1.rpmlist
lead-distro-rhel6.0.rpmlist

Specifically, SMC for Altix ICE software looks for /etc/opt/sgi/rpmlists/generate-*.rpmlist and creates an image for each rpmlist that matches.

It also determines the default image to use for each node type by hard-coding "$nodeType-$distro" as the type, where distro is the admin node's distro and nodeType is compute, service, leader, and so on. The default image can be overridden by specifying a global cattr attribute named image_default_$nodeType; for example, image_default_service. Use cattr -h, for information about the cattr command.

The following example shows the contents of the /etc/opt/sgi/rpmlists directory after the crepo command has created the suggested RPM lists. The files with -distro- in the name are the base Linux distro RPMs that SGI recommends.

Change directory (cd) to /etc/opt/sgi/rpmlists. Use the ls command to see a list of rpms, as follows:

admin:/etc/opt/sgi/rpmlists # ls
compute-minimal-sles11sp1.rpmlist  generated-lead-rhel6.0.rpmlist
generated-compute-rhel6.0.rpmlist  generated-service-rhel6.0.rpmlist

For more information on rpmlist customization information, see “Creating Compute and Service Node Images Using the cinstallman Command”.

For a crepo command usage statement, perform the following:

admin:~ # crepo --h
crepo Usage:
Operations:
--help                : print his usage message

--add {path/URL}      : add SGI/SMC media to the system repositories
       --custom {name}: Optional.Use with -add to add custom repo under
                        /tftpboot Repo must pre-exist for this case.

--del {product}       : delete an add-on product and associated /tftpboot repo

--select {product}    : mark the product as selected

--show                : show available add-on products

--show-distro         : like show, but only reports distro media like sles10sp2

--show-updateurls     : Show the update sources associated add-on products

--reexport            : re-export all repositories with yume.  Use if there
                        was a yume export problem previously.

--unselect {product}  : mark the product as not selected


Flags:

Note for --add: If the pathname is local to the machine, it can be an
ISO file or mounted media.  If a network path is used -- such as an nfs
path or a URL -- the path must point to an ISO file.  The argument to
--add may be a comma delimited list to specify multiple source media.

Use --add for SGI/SMC media, to make the repos and rpms available. If the
supplied SGI/SMC media has suggested rpms from SMC node types, those
suggested rpms will be integrated with the default rpmlists for leader,
service, and compute nodes.  You can use create-default-sgi-images to
re-create the default images including new suggested packages or you can
just browse the updated versions in /etc/opt/sgi/rpmlists.

Use --add with --custom to register your own custom repository.  This will
ensure that, by default, the custom repository is available to yume and
mksiimage commands.  It is assumed you will maintain your own default package
lists, perhaps using the sgi default package lists in /etc/opt/sgi/rpmlists
or /opt/sgi/share/rpmlists as a starting point.  The directory and rpms within
must pre-exist.  This script will create the yum metadata for it.
Example:
crepo --add /tftpboot/myrepo --custom my-custom-name

`cinstallman` Command

The cinstallman command is a wrapper tool for several SMC for Altix ICE operations that previously ran separately. You can use the cinstallman command to perform the following:

Create an image from scratch
Clone an existing image
Recreate an image (so that any nodes associated with said image prior to the command are also associated after)
Use existing images that may have been created by some other means
Delete images
Show available images
Update or manage images (via yume)
Update or manage nodes (via yume)
Assign images to nodes
Choose what a node should do next time it reboots (image itself or boot from its disk)
Refresh the bittorrent tarball and torrent file for a compute node image after making changes to the expanded image

When compute images are created for the first time, a bittorrent tarball is also created. When images are pushed to rack leaders for the first time, bittorrent is used to transport the tarball snapshot of the image. However, as you make adjustments to your compute image, those changes do not automatically generate a new bittorrent tarball. We handle that situation by always doing a follow-up rsync of the compute image after transporting the tarball. However, as your compute image begins to diverge from the bittorrent tarball snapshot, it becomes less and less efficient to transport a given compute node image that is new to a given rack leader.

You no longer need to use yum, yume, or mksiimage commands directly for most common operations. Compute images are automatically configured in such a way as to make them available to the cimage command.

For a cinstallman command usage statement, perform the following:

admin:~ # cinstallman --help
cinstallman Usage:

cinstallman is a tool that manages:
 - image creation (as a wrapper to mksiimage)
 - node package updates (as a wrapper to yume)
 - image package updates (yume within a chroot to the image)

This is a convenience tool and not all operations for the commands that are
wrapped are provided.  The most common operations are collected here for
ease of use.

For operations that take the --node parameter, the node can be an aggregation
of nodes like cimage and cpower can take.  Depending on the situation,
non-managed or offline nodes are skipped.

The tool retrieves the registered repositories from crepo so that they
need not be specified on the command line.

Operations:
--help                  : print his usage message
--create-image          : create a new systemimager image
                          By default, requires --rpmlist and --image
                          Optional flags  below:
       --clone          : Clone existing image, requires --source, --image.
                          Doesn't require --rpmlist.
       --recreate       : Like --del-image then --add-image, but preserves any
                          node associations.
                          Requires --image and --rpmlist
       --repos {list}   : A comma-seperated list of repositories to use.
       --use-existing   : register an already existing image, doesn't
                          require --rpmlist
       --image {image}  : Specify the image to operate on
       --rpmlist {path} : Provide the rpmlist to use when creating images
       --source {image} : Specify a source image to operate on (for clone)

--del-image             : delete the image, may use with --del-nodes
       --image {image}  : Specify the image to operate on

--show-images           : List available images (similar to mksiimage -L)

--show-nodes            : Show non-compute nodes (similar to mksimachine -L)

--update-image          : update packages in image to latest packages available
                          in repos, Requires --image
       --image {image}  : Specify the image to operate on

--refresh-image         : Refresh the given image to include all packages
                          in the supplied rpmlist.  Use after registering
                          new media with crepo that has new suggested rpms.
       --image {image}  : Specify the node or nodes to operate on
       --rpmlist {path} : rpmlist containing packes to be sure are included

--yum-image             : Perform yum operations to supplied image, via yume
                          Requires --image, trailing arguments passed to yume
       --image {image}  : Specify the image to operate on

--update-node           : Update supplied node to latest pkgs avail in
                          repos, requires --node
       --node {node}    : Specify the node or nodes to operate on

--refresh-node          : Refresh the given node to include all packages
                          in the supplied rpmlist.  Use after registering
                          new media with crepo that has new suggested rpms.
       --node {node}    : Specify the node or nodes to operate on
       --rpmlist {path} : rpmlist containing packes to be sure are included

--yum-node              : Perform yum operations to nodes, via yume.  Requires
                          --node.  Trailing arguments passed to yume
       --node {node}    : Specify the node or nodes to operate on

--assign-image          : Assign image to node.  Requires --node, --image
       --node {node}    : Specify the node or nodes to operate on
       --image {image}  : Specify the image to operate on

--next-boot {image|disk}: node action next boot: boot from disk or
                          reinstall/reimage?  Requires --node

--refresh-bt            : Refresh the bittorrent tarball and torrent file
                          Requires --image
       --image {image}  : Specify the image to operate on

In the following example, the --refresh-node operation is used to ensure the online managed service nodes include all the packages in the list. You could use this if you updated your rpmlist to include new packages or if you recently added new media with the crepo command and want running nodes to have the newly updated packages. A similar --refresh-image operation exists for images.

# cinstallman --refresh-node --node service\* --rpmlist
/etc/opt/sgi/rpmlists/service-sles11.rpmlist

Customizing Software On Your SGI Altix ICE System

This section discusses how to manage various nodes on your SGI Altix ICE system. It describes how to configure the various nodes, including the compute and service nodes. It describes how to augment software packages. Many tasks having to do with package management have multiple valid methods to use.

For information on installing patches and updates, see “Installing SMC for Altix ICE Patches and Updating SGI Altix ICE Systems ” in Chapter 2.

Creating Compute Node Custom Images

You can add per-host compute node customization to the compute node images. You do this by adding scripts either to the /opt/sgi/share/per-host-customization/global/ directory or the /opt/sgi/share/per-host-customization/mynewimage/ directory on the system admin controller.

Note: When creating custom images for compute nodes, make sure you clone the original SGI images. This provides the original images intact that you can fall back to if necessary. The folowing example is based on SLES.

Scripts in the global directory apply to all compute nodes images. Scripts under the image name apply only to the image in question. The scripts are cycled through once per host when being installed on the rack leader controllers. They receive one input argument, which is the full path (on the rack leader controller) to the per-host base directory, for example, /var/lib/sgi/mynewimage/i2n11. There is a README file at /opt/sgi/share/per-host-customization/README on the system admin controller, as follows:

This directory contains compute node image customization scripts which are
executed as part of the install-image operations on the leader nodes when
pulling over a new compute node image.

After the image has been pulled over, and the per-host-customization dir has
been rsynced, the per-host /etc and /var directories are populated, then the
scripts in this directory are cycled through once per-host.  This allows the
scripts to source the node specific network and cluster management settings,
and set node specific settings.

Scripts in the global directory are iterated through first, then if a
directory exists that matches the image name, those scripts are iterated
through next.

You can use the scripts in the global directory as examples.

An example global script, /opt/sgi/share/per-host-customization/global/sgi-fstab is, as follows:

#!/bin/sh
#
# Copyright (c) 2007,2008 Silicon Graphics, Inc.
# All rights reserved.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
#  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
#
# Set up the compute node's /etc/fstab file.
#
# Modify per your sites requirements.
#
# This script is excecuted once per-host as part of the install-image
operation
# run on the leader nodes, which is called from cimage on the admin node.
# The full path to the per-host iru+slot directory is passed in as $1,
# e.g. /var/lib/sgi/per-host/<imagename>/i2n11.
#

# sanity checks
. /opt/sgi/share/per-host-customization/global/sanity.sh

iruslot=$1
os=( $(/opt/oscar/scripts/distro-query -i ${iruslot} | sed -n '/^compat
/s/^compat.*: //p') )
compatdistro=${os[0]}${os[1]}

if [ ${compatdistro} = "sles10" -o ${compatdistro} = "sles11" ]; then

	#
	# SLES 10 compatible
	#
	cat <<EOF >${iruslot}/etc/fstab
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
tmpfs           /tmp            tmpfs   size=150m       0       0
EOF

elif [ ${compatdistro} = "rhel5"  ]; then

	#
	# RHEL 5 compatible
	#

	#
	# RHEL expects several subsys directories to be present under
/var/run
	# and /var/lock, hence no tmpfs mounts for them
	#
	cat <<EOF >${iruslot}/etc/fstab
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
tmpfs           /tmp            tmpfs   size=150m       0       0
devpts          /dev/pts        devpts  gid=5,mode=620  0       0
EOF

else

	echo -e "\t$(basename ${0}): Unhandled OS.  Doing nothing"

fi

Modify Compute Image Kernel Boot Options

You can use the cattr command to set extra kernel boot parameters for compute nodes on a per-image basis. For example to append "cgroup_disable=memory" to kernel boot parameters for any node booting the "compute-sles11sp1" image, perform a command similar to the following:

% cattr set kernel_extra_params-compute-sles11sp1 cgroup_disable=memory

Push the image, as follows:

# cimage --push-rack mynewimage r1

Compute Node Per-Host Customization for Additional Network Interfaces

Note: The following example is only for systems running SLES.

Per compute-node customization may be useful for configuring additional network interfaces that are on some, but not all, compute nodes. An example of how to configure network interfaces on individual compute nodes is the /opt/sgi/share/per-host-customization/mynewimage/mycustomization script, that follows:

Copyright (c) 2008 Silicon Graphics, Inc.
# All rights reserved.
#
# do node specific setup
#
# This script is excecuted once per-host as part of the install-image operation
# run on the leader nodes, which is called from cimage on the admin node.
# The full path to the per-host iru+slot directory is passed in as $ARGV[0],
# e.g. /var/lib/sgi/per-host/<imagename>/i2n11.
#

use lib "/usr/lib/systemconfig","/opt/sgi/share/per-host-customization/global";
use sanity;

sanity_checks();

$blade_path = $node = $ARGV[0];
$node =~ s/.*\///;

sub i0n4 {
        my $ifcfg="etc/sysconfig/network/ifcfg-eth2";
        open(IFCFG, ">$blade_path/$ifcfg") or
                die "$0: can't open $blade_path/$ifcfg";
        print IFCFG<<EOF
BOOTPROTO='static'
IPADDR='10.20.0.1'
NETMASK='255.255.0.0'
STARTMODE='onboot'
WIRELESS='no'
EOF
        ;
        close(IFCFG);
}

@nodes = ("i0n4");

foreach $n (@nodes) {
        if ( $n eq $node ) {
                eval $n;
        }
}

Pushing mynewimage to rack 1 causes the eth2 interface of compute node r1i0n4 to be configured with IP address 10.20.0.1 when the node is brought up with mynewimage. Push the image, as follows:

# cimage --push-rack mynewimage r1

Customizing Software Images

Note: Procedures in this section describe how to work with service node and compute node images. Always use a cloned image. If you are adjusting an RPM list, use your own copy of the RPM list.

The service and compute node images are created during the configure-cluster operation (or during your upgrade from a prior release). This process uses an RPM list to generate a root on the fly.

You can clone a compute node image, or create a new one based on an RPM list. For service nodes, SGI does not support a clone operation. For compute images, you can either clone the image and work on a copy or you can always make a new compute node image from the SGI supplied default RPM list.

Procedure 3-1. Creating a Simple Compute Node Image Clone

Note: Always work from a clone image, see “Customizing Software Images”.

To create a simple compute node image clone from the system admin controller, perform the following steps:

To clone the compute node image, perform the following:

# cinstallman --create-image --clone --source compute-sles11 --image compute-sles11-new

To see the images and kernels in the list, perform the following:

# cimage --list-images
image: compute-sles11
       kernel: 2.6.27.19-5-smp

image: compute-sles11-new
       kernel: 2.6.27.19-5-smp

To push the compute node image out to the rack, perform the following:
# cimage --push-rack compute-sles11-new r\*
To change the compute nodes to use the cloned image/kernel pair, perform the following:
# cimage --set compute-sles11-new 2.6.27.19-5-smp "r*i*n*"

Procedure 3-2. Manually Adding a Package to a Compute Node Image

To manually add a package to a compute node image, perform the steps:

Note: Use the cinstallman command to install packages into images when the package you are adding is in a repository. This example shows a quick way to manually add a package for compute nodes when you do not want the package to be in a custom repository. For information on the cinstallman command, see “cinstallman Command”.

Make a clone of the compute node image, as described in “Customizing Software Images”.

Note: This example shows SLES11.

Determine what images and kernels you have available now, as follows:

# cimage --list-images
  image: compute-sles11
         kernel: 2.6.27.19-5-smp

  image: compute-sles11-new
         kernel: 2.6.27.19-5-smp

From the system admin controller, change directory to the images directory, as follows:
# cd /var/lib/systemimager/images/
From the system admin controller, copy the RPMs your wish to add, as follows, where compute-sles11-new is your own compute node image, as follows:
# cp /tmp/newrpm.rpm compute-sles11-new/tmp

The new RPMs now reside in /tmp directory in the image named compute-sles11-new. To install them into your new compute node image, perform the following commands:

# chroot compute-sles11-new bash

And then perform the following:

# rpm -Uvh /tmp/newrpm.rpm

At this point, the image has been updated with the RPM.

Note: Remove the RPMs or ISO images before pushing an image or the RPM/ISO will be pushed multiple times for each image slowing down the push and even filling up the root of the RLC (leader node).

# rm /tmp/newrpm.rpm

The image on the system admin controller is updated. However, you still need to push the changes out. Ensure there are no nodes currently using the image and then run this command:
# cimage --push-rack compute-sles11-new r\*
This will push the updates to the rack lead controllers and the changes will be seen by the compute nodes the next time they start up. For information on how to ensure the image is associated with a given node, see the cimage --set command and the example in Procedure 3-1.

Procedure 3-3. Manually Adding a Package to the Service Node Image

To manually add a package to the service node image, perform the following steps:

Use the cinstallman command to create your own version of the service node image. See “cinstallman Command”.
Change directory to the images directory, as follows:
# cd /var/lib/systemimager/images/
From the system admin controller, copy the RPMs your wish to add, as follows, where my-service-image is your own service node image:
# cp /tmp/newrpm.rpm my-service-image/tmp
The new RPMs now reside in /tmp directory in the image named my-service-image. To install them into your new service node image, perform the following commands:
# chroot my-service-image bash
And then perform the following:
# rpm -Uvh /tmp/newrpm.rpm
At this point, the image has been updated with the RPM. Please note, that unlike compute node images, changes made to a service node image will not be seen by service nodes until they are reinstalled with the image. If you wish to install the package on running systems, you can copy the RPM to the running system and use the RPM from there.

`cimage` Command

The cimage command allows you to list, modify, and set software images on the compute nodes in your system.

For a help statement, perform the following command:

admin:~ # cimage --help
cimage is a program for managing compute node root images in SMC for ICE.

Usage: cimage OPTION ...

Options
 --help                                 Usage and help text.
 --debug                                Output additional debug information.
 --list-images                          List images and their kernels.
 --list-nodes NODE                      List node(s) and what they are set to.
 --set [OPTION] IMAGE KERNEL NODE       Set node(s) to image and kernel.
    --nfs                               Use NFS roots (default).
    --tmpfs                             Use tmpfs roots.
 --set-default [OPTION] IMAGE KERNEL    Set default image, kernel, rootfs type.
    --nfs                               Use NFS roots (default).
    --tmpfs                             Use tmpfs roots.
 --show-default                         Show default image, kernel, rootfs type.
 --add-db IMAGE                         Add image and its kernels to the db.
 --del-db IMAGE                         Delete image and its kernels from db.
 --update-db IMAGE                      Short-cut for --del-db, then --add-db.
 --push-rack [OPTIONS] IMAGE RACK       Push or update image on rack(s).
    --force                             Bypass the booted nodes check, deletes.
    --update-only                       Skip files newer in dest, no delete.
    --quiet                             Turn off diagnostic information.
 --del-rack IMAGE RACK                  Delete an image from rack(s).
 --clone-image OIMAGE NIMAGE            Clone an existing image to a new image.
 --del-image [OPTIONS] IMAGE            Delete an existing image entirely.
    --quiet                             Turn off diagnostic information.

RACK arguments take the format 'rX'
NODE arguments take the format 'rXiYnZ'
ROOTFS argument can be either 'nfs' or 'tmpfs'

X, Y, Z can be single digits, a [start-end] range, or * for all matches.

EXAMPLES

Example 3-1. cimage Command Examples

The following examples walk you through some typical cimage command operations.

To list the available images and their associated kernels, perform the following:

# cimage --list-images
image: compute-sles11
        kernel: 2.6.27.19-5-carlsbad
        kernel: 2.6.27.19-5-default
image: compute-sles11-1_7
        kernel: 2.6.27.19-5-default

To list the compute nodes in rack 1 and the image and kernel they are set to boot, perform the following:

# cimage --list-nodes r1
r1i0n0: compute-sles11 2.6.27.19-5-default nfs
r1i0n8: compute-sles11 2.6.27.19-5-default nfs

The cimage command also shows the root filesystem type (nfs or tmpfs)

To set the r1i0n0 compute node to boot the 2.6.27.19-5-smp kernel from the compute-sles11 image, perform the following: :

# cimage --set compute-sles11 2.6.27.19-5-smp r1i0n0

To list the nodes in rack 1 to see the changes set in the example above, perform the following:

# cimage --list-nodes r1
r1i0n0: compute-sles11 2.6.27.19-5-smp
r1i0n1: compute-sles11 2.6.27.19-5-smp
r1i0n2: compute-sles11 2.6.27.19-5-smp
[...snip...]

To set all nodes in all racks to boot the 2.6.27.19-5-smp kernel from the compute-sles11 image, perform the following:

# cimage --set compute-sles11 2.6.27.19-5-smp r*i*n*

To set two ranges of nodes to boot the 2.6.27.19-5-smp kernel, perform the following:

# cimage --set compute-sles11 2.6.27.19-5-smp r1i[0-2]n[5-6] r1i[2-3]n[0-4]

To clone the compute-sles11 image to a new image (so that you can modify it) , perform the following:

# cinstallman --create-image --clone --source compute-sles11 --image mynewimage
Cloning compute-sles11 to mynewimage ... done

The clone process adds the image and its kernels to the database.

Note: If you have made changes to the compute node image and are pushing that image out to leader nodes, it is a good practice to use the cinstallman --refresh-bt --image { image} command to refresh the bittorrent tarball and torrent file for a compute node image. This avoids duplication by rsync when the image is pushed out to the leader nodes. For more information, see the cinstallman -h usage statement or “cinstallman Command”.

To change to the cloned image created in the example, above, copy the needed rpms into the /var/lib/systemimager/images/mynewimage/tmp directory, use the chroot command to enter the directory and then install the rpms, perform the following:

# cp *.rpm /var/lib/systemimager/images/mynewimage/tmp
# chroot /var/lib/systemimager/images/mynewimage/ bash
# rpm -Uvh /tmp/*.rpm

If you make changes to the kernels in the image, you need to refresh the kernel database entries for your image, To do this, perform the following:

# cimage --update-db mynewimage

If you did not make changes to the kernels in the cloned image created in the example above, you can omit this step.

To push new software images out to the compute blades in a rack or set of racks, perform the following:

# cimage --push-rack mynewimage r*
r1lead: install-image: mynewimage
r1lead: install-image: mynewimage done.

To list images in the database the kernels they contain, perform the following:

# cimage --list-images

image: compute-sles11
        kernel: 2.6.16.60-0.7-carlsbad
        kernel: 2.6.16.60-0.7-smp

image: mynewimage
        kernel: 2.6.16.60-0.7-carlsbad
        kernel: 2.6.16.60-0.7-smp

To set some compute nodes to boot an image, perform the following:

# cimage --set mynewimage 2.6.16.60-0.7-smp r1i3n*

You need to reboot the compute nodes to run the new images.

Completely remove an image you no longer use, both from system admin controller and all compute nodes in all racks, perform the following:

# cimage --del-image mynewimage
r1lead: delete-image: mynewimage
r1lead: delete-image: mynewimage done.

Using `cinstallman` to Install Packages into Software Images

The packages that make up SMC for Altix ICE, SGI Foundation, and the Linux distribution media, and any other media or custom repositories you have added reside in repositories. The cinstallman command looks up the list of all repositories and provides that list to the commands it calls out for its operation such as yume.

Note: Always work with copies of software images.

The cinstallman command can update packages within systemimager images. You may also use cinstallman to install a single package within an image.

However, cinstallman and the commands it calls only works with the configured repositories. So if you are installing your own RPM, you will need that package to be part of an existing repository. You may use the crepo command to create a custom repository into which you can collect custom packages.

Note: The yum command maintains a cache of the package metadata. If you just recently changed the repositories, yum caches for the nodes or images you are working with may be out of date. In that case, you can issue the yum command "clean all" with --yum-node and --yum-image. The cinstallman command --update-node and --update-image options do this for you.

The following example shows how to install the zlib-devel package in to the service node image so that the next time you image or install a service node, it will have this new package.

# cinstallman --yum-image --image my-service-sles11 install zlib-devel

You can perform a similar operation for compute node images. Note the following:

If you update a compute node image on the system admin controller (admin node), you have to use the cimage command to push the changes. For more information on the cimage command, see “cimage Command”.
If you update a service node image on the admin node, that service node needs to be reinstalled and/or reimaged to get the change. The discover command can be given an alternate image or you may use the cinstallman --assign-image command followed by the cinstallman --next-boot command to direct the service node to reimage itself with a specified image the next time it boots.

Using `yum` to Install Packages on Running Service or Leader Nodes

Note: These instructions only apply to managed service nodes and leader nodes. They do not apply to compute nodes.

You can use the yum command to install a package on a service node. From the admin node, you can issue a command similar to the following:

# cinstallman --yum-node --node service0 install zlib-devel

Note: To get all service nodes, replace service0 with service\*.

For more information on the cinstallman command, see “cinstallman Command”.

Creating Compute and Service Node Images Using the `cinstallman` Command

You can create service node and compute node images using the cinstallman command. This generates a root directory for images, automatically.

Fresh installations of SMC for Altix ICE create these images during the configure-cluster installation step (see “Installing SMC for Altix ICE Admin Node Software ” in Chapter 2).

The RPM lists that drive which packages get installed in the images are listed in files located in /etc/opt/sgi/rpmlists. For example, /etc/opt/sgi/rpmlists/compute-sles11.rpmlist (see “crepo Command”). You should NOT edit the default lists. These default files are recreated by the crepo command when repositories are added or removed. Therefore, you should only use the default RPM lists as a model for your own.

Note: The procedure below uses SLES.

Procedure 3-4. Using the cinstallman Command to Create a Service Node Image:

To create a service node image using the cinstallman command, perform the following steps:

Make a copy of the example service node image RPM list and work on the copy, as follows:

# cp /etc/opt/sgi/rpmlists/service-sles11.rpmlist
/etc/opt/sgi/rpmlists/my-service-node.rpmlist

Add or remove any packages from the RPM list. Keep in mind that needed dependencies are pulled in automatically.
Use the cinstallman command with the --create-image option to create the images root directory, as follows:
# cinstallman --create-image --image my-service-node-image --rpmlist /etc/opt/sgi/rpmlists/my-service-node.rpmlist
This example uses my-service-node-image as the home/name of the image.

Output is logged to /var/log/cinstallman on the admin node.
After the cinstallman comand finishes, the image is ready to be used with service nodes. You can supply this image as an optional image name to the discover command, or you may assign an existing service node to this image using the cinstallman --assign-image command. You can tell a service node to image itself next reboot by using the cinstallman --next-boot option.

Procedure 3-5. Use the cinstallman Command to Create a Compute Node Image

To create a compute node image using the cinstallman command, perform the following steps:

Make a copy of the compute node image RPM list and work on the copy, as follows:

# cp /etc/opt/sgi/rpmlists/compute-sles11.rpmlist
/etc/opt/sgi/rpmlists/my-compute-node.rpmlist

Add or remove any packages from the RPM list. Keep in mind that needed dependencies are pulled in automatically.
Run the cinstallman command to create the root, as follows:
# cinstallman --create-image --image my-compute-node-image --rpmlist /etc/opt/sgi/rpmlists/my-compute-node.rpmlist
This example uses the name name my-compute-node-image as the name.

Output is logged to /var/log/cinstallman on the admin node.

The cinstallman command makes the new image available to the cimage command.
For information on how to use the cimage command to push this new image to rack leader controllers (leader nodes), see “cimage Command”.

Installing a Service Node with a Non-default Image

If you have a non-default service node image you wish to install on a service node, you have two choices, as follows:

Specify the image name when you first discover the node with the discover command.
Use the cinstallman command to associate an image with a service node, then set up the node to reinstall itself the next time it boots.

The following example shows how to associate a custom image at discover time:

# discover --service 2,image=my-service-node-image

The next example shows how to reinstall an already discovered service node with a new image:

# cinstallman --assign-image --node service2 --image my-service-node-image
# cinstallman --next-boot image --node service2

When you reboot the node, it will reinstall itself.

For more information on the discover command, see “discover Command” in Chapter 2. For more information on the cinstallman command, see “cinstallman Command”.

Retrieving a Service Node Image from a Running Service Node

To retrieve a service node image from a running service node, perform the following steps:

As root user , log into the service node from which you wish to retrieve an image. You can use the si_prepareclient(8) program to extract an image. Start the program, as follows:

service0:~ # si_prepareclient --server admin
Welcome to the SystemImager si_prepareclient command. This command may modify
the following files to prepare your golden client for having its image 
retrieved by the imageserver.  It will also create the /etc/systemimager 
directory and fill it with information about your golden client. All modified
files will be backed up with the .before_systemimager-3.8.0 extension.

 /etc/services:
   This file defines the port numbers used by certain software on your system.
   Entries for rsync will be added if necessary.

 /tmp/filetlOeP5:
   This is a temporary configuration file that rsync needs on your golden client
   in order to make your filesystem available to your SystemImager server.

 inetd configuration:
   SystemImager needs to run rsync as a standalone daemon on your golden client
   until its image is retrieved by your SystemImager server.  If rsyncd is 
   configured to run as a service started by inetd, it will be temporarily
   disabled, and any running rsync daemons or commands will be stopped.  Then,
   an rsync daemon will be started using the temporary configuration file
   mentioned above.
   
See "si_prepareclient --help" for command line options.

Continue? (y/[n]):

Enter y to continue. After a few moments, you are returned to the command prompt. You are now ready to retrieve the image from the admin node.

Exit the service0 node, and as root user on the admin node, perform the following command: (Replace the image name and service node name, as needed.)
admin # mksiimage --Get --client service0 --name myimage
It now retrieves the image. No progress information is provided. It takes several minutes depending on the size of the image on the service node.
Use the cinstallman command to register the newly collected image:
admin # cinstallman --create --use-existing --image myimage
If you want to discover a node using this image directly, you can use the discover command, as follows:
admin # discover --service 0,image=myimage

If you want to re-image an already discovered node with your new image, run the following commands:

# cinstallman --assign-image --node service0 --image myimag
# cinstallman --next-boot image --node service0

Reboot the service node.

Using a Custom Repository for Site Packages

This section describes how to maintain packages specific to your site and have them available to the crepo command (see “crepo Command”).

SGI suggests putting site-specific packages in a separate location. They should not reside in the same location as SGI or Novell supplied packages.

Procedure 3-6. Setting Up a Custom Repository for Site Packages

To set up a custom repository for your custom packages, perform the following steps:

Create directory for your site-specific packages on the system admin controller (admin node), as follows:
# mkdir -p /tftpboot/site-local/sles-10-x86_64

Copy your site packages in to the new directory, as follows:

# cp my-package-1.0.x86_64.rpm /tftpboot/site-local/sles-10-x86_64

Register your custom repository using the crepo command. This command will ensure your repository is consulted when the cinstallman command performs its operations. This command also creates the necessary yum/repomd metadata.
# crepo --add /tftpboot/site-local/sles-10-x86_64 --custom my-repo
Your new repository may be consulted by cinstallman command operations going forward including updating images, nodes, and creating images.
If you wish this repository to be used by cinstallman by default, you need to select it. Use the following command:
# crepo --select my-repo
If you use cinstallman to create an image, you will want to add your custom package to the rpmlist you use with the cinstallman command (see “Using cinstallman to Install Packages into Software Images”).

SGI Altix ICE System Configuration Framework

All node types that are part of an SGI Altix ICE system can have configuration settings adjusted by the configuration framework. There is some overlap between the per-host customization instructions and the configuration framework instructions. Each approach plays a role in configuring your system. The major differences between the two methods are, as follows:

Per-host customization runs at the time an image is pushed to the rack leader controllers.
Per-host customization only applies to compute node images.
The Altix ICE system configuration framework can be used with all node types.
The system configuration framework is run when a new root is created, when SuSEconfig command is run for some other reason, as part of a yum operation, or when new compute images are pushed with the cimage command.

This framework exists to make it easy to adjust configuration items. There are SGI-supplied scripts already present. You can add more scripts as you wish. You can also exclude scripts from running without purging the script if you decide a certain script should not be run. The following set of questions in bold and bulleted answers describes how to use the system configuration framework.

How does the system configuration framwork operate?

These files could be added, for example, to a running service node, or to an already created service or compute image. Remember that images destined for compute nodes need to be pushed with the cimage command after being altered. For more information, see “cimage Command”.

A /opt/sgi/lib/cluster-configuration script is called, from where it is called is described below.
That script iterates through scripts residing in /etc/opt/sgi/conf.d.
Any scripts listed in /etc/opt/sgi/conf.d/exclude are skipped, as are scripts, that are not executable.
Scripts in system configruation framework must be tolerant of files that do not exist yet, as described below. For example, check that a syslog configuration file exists before trying to adjust it.
Scripts ending in a distro name, or a distro name with a specific distro version are only run if the node in question is running that distro. For example, /etc/opt/sgi/conf.d/99-foo.sles would only run if the node was running sles. This example shows the precedence of operations:If you had 88-myscript.sles10, 88-myscript.sles, and 88-myscript
- On a sles10 system, 88-myscript.sles10 would execute
- On a sles system that is not sles10, 88-myscript.sles would execute
- On all other distros, 88-myscript would would execute
If you wish to make a custom version of an script supplied by SGI, you may simply name it with ".local" and the local version will run in place of the one supplied by SGI. This allows for customization without modifying scripts supplied by SGI. Scripts ending in .local have the highest precedence. In other words, if you had 88-myscript.sles, and 88-myscript.local, then 88-myscript.local would execute in all cases and the other 88-myscript scripts would never execute.

From where is the framework called?

The callout for /opt/sgi/lib/cluster-configuration is implemented as a yum plugin that executes after packages have been installed and cleaned.
On SLES only, there is also a SUSE configuration script in the /sbin/conf.d directory, called SuSEconfig.00cluster-configuration, that calls the framework. This is in case of you are using YaST to install or upgrade packages.
On SLES only, one of the scripts called by the framework calls SuSEconfig. A check is made to avoid a callout loop.
The framework is also called when the admin, leader, or service nodes start up. The call is made just after networking is configured. As a site administrator, you could create custom scripts here that check on or perform certain configuration operations.
When using the cimage command to push a compute node root image to rack leaders, the configuration framework executes within the chroot of the compute node image after it is pulled from the admin node to the rack leader node.

How do I adjust my system configuration?

Create a small script in /etc/opt/sgi/conf.d to do the adjustment.

Be sure that you test for existence of files and do not assume they are there (see "Why do scripts need to tolerate files that do not exist but should?" below).

Why do scripts need to tolerate files that do not exist but should?

This is because the mksiimage command runs yume and yum in two steps. The first step only installs 40 or so RPMs but our framework is called then too. The second pass installs the other "hundreds" of RPMs. So the framework is called once before many packages are installed, and again after everything is in place. So not all files you expect might be available when your small script is called.

How does the yum plugin work?

In order for the yum plugin to work, the /etc/yum.conf file has to have plugins=1 set in its configuration file. SMC for Altix ICE software ensures that is in place by way of a trigger in the sgi-cluster package. Any time yum is installed or updated, it verify plugins=1 is set.

How does yume work?

yume, an oscar wrapper for yum, works by creating a temporary yum configuration file in /tmp and then points yum at it. This temporary configuration file needs to have plugins enabled. A tiny patch to yume makes this happen. This fixes it for yume and also mksiimage, which calls yume as part of its operation.

Cluster Configuration Repository: Updates on Demand

SMC for ICE contains a cluster configuration repository/update framework. This framework generates and distributes configuration updates to admin, service, and leader nodes in the cluster. Some of the configuration files managed by this framework include C3 conserver, DNS, Ganglia, hosts files, and NTP.

When an event occurs that requires these files to be updated, the framework executes on the admin node. The admin node stores the updated configuration framework in a special cached location and updates the appropriate nodes with their new configuration files.

In addition to the updates happening as required, the configuration file repository is consulted when a admin, service, or leader node boots. This happens shortly after networking is started. Any configuration files that are new or updated are transferred at this early stage so that the node is fully configured by the time the node is fully operational.

There are no hooks for customer configuration in the configuration repository at this time.

This update framework is tied in with the /etc/opt/sgi/conf.d configuration framework to provide a full configuration solution. As mentioned earlier, customers are encouraged to create /etc/opt/sgi/conf.d scripts to do cluster configuration.

`cnodes` Command

The cnodes command provides information about the types of nodes in your system. For help information, perform the following:

[admin ~]# cnodes --help
Options:
 --all                  all compute, leader and service nodes, and switches
 --compute              all compute nodes
 --leader               all leader nodes
 --service              all service nodes
 --switch               all switch nodes
 --online               modifier: nodes marked online
 --offline              modifier: nodes marked offline
 --managed              modifier: managed nodes
 --unmanaged            modifier: unmanaged nodes
 --smc-for-ice-names    modifier: return SMC-for-ICE node names instead of hostnames

Note: default modifiers are 'online' and 'managed' unless otherwise specified.

EXAMPLES

Example 3-2. cnodes Example

The following examples walk you through some typical cnodes command operations.

To see a list of all nodes in your system, perform the following:

[admin ~]# cnodes --all
r1i0n0
r1i0n1
r1lead
service0

To see a list of all compute nodes, perform the following:

[admin ~]# cnodes --compute
r1i0n0
r1i0n1

To see a list of service nodes, perform the following:

[admin ~]#  cnodes --service
service0

Multi-disto Image Management

By default, SMC for Altix ICE software associates one software distribution (distro) with all the images and nodes in the system. For example, if RHEL 6 is used for the admin node then, by default, RHEL 6 is used for the compute blades, leader nodes, and service nodes.

However, SMC for Altix ICE software allows support for multiple distros for compute nodes and service nodes. This means that the nodes and images for service and compute nodes need not match the Linux distribution running on admin/leader nodes.

The following information is intended to make it easier for you to see which media goes with which distributions.

RHEL 6

Required:
- SGI Foundation Software 2.3
- SMC for Altix ICE 1.0
- Red Hat Enterprise Linux 6 Install DVD
Optional:
- SGI MPI 1.0
- SGI Accelerate 1.0
RHEL 5.5

Required:
- SGI Foundation 1 Service Pack 6
- SMC for Altix ICE 1.0
- Red Hat Enterprise Linux 5.5 Install DVD
Optional:
- SGI ProPack 6 Service Pack 6
SLES 11 SP1

Required:
- SGI Foundation Software 2.3
- SMC for Altix ICE 1.1
- SUSE Linux Enterprise Server 11 SP1 Install DVD #1
Optional:
- SGI MPI 1.1
- SGI Accelerate 1.1
SLES 10 SP3

Required:
- SGI Foundation Software 1 Service Pack 6
- SMC for Altix ICE 1.1
- SUSE Linux Enterprise Server 10 SP3 Install DVD #1
Optional:
- SGI ProPack 6 Service Pack 6

The crepo command, described in “crepo Command”, is the starting point for multi-distro support.

Here is an example of the commands you might run in order to create a RHEL 5.5 service and compute node image:

First, make sure no repositories are currently selected, as follows:
# crepo --show
For any repositories from the result above, that is marked as selected, run this command to unselect it, as follows:
# crepo --unselect repository name

Next, register the repositories for RHEL 5.5. You could point the crepo command at an ISO image or at the mounted media. The ISO file names may not exactly match what you downloaded. In this example, we include optional TBD.

# crepo --add foundation-2.3-cd1-media-rhel5-x86_64.iso
# crepo --add TBD-cd1-media-rhel5-x86_64.iso
# crepo --add RHEL5.5-Server-20100322.0-x86_64-DVD.iso
# crepo --add smc-1.0-cd1-media-rhel5-x86_64.iso

Now, select all of the repositories you just added. Use crepo --show to find the names to use.

# crepo --select SGI-Foundation-Software-1SP6-rhel5
# crepo --select SGI-TBD-for-Linux-rhel5
# crepo --select smc-1.0-rhel5
# crepo --select Red-Hat-Enterprise-Linux-Server-5.5

Now, create images:

# cinstallman --create-image --image service-rhel55 --rpmlist /etc/opt/sgi/rpmlists/generated-service-rhel5.5.rpmlist
# cinstallman --create-image --image compute-rhel55 --rpmlist /etc/opt/sgi/rpmlists/generated-compute-rhel5.5.rpmlist

For the service node, you are now ready to image a node. If the node is not yet discovered, use the discover command with the image= parameter. If the node is already discovered and you wish to re-install it, use the cinstallman --assign-image and cinstallman --next-boot operations to assign the new image to the node in question and mark it for installation on next boot. You can reset the service node and it will install itself.
For the compute image, you need to also push it to the racks. For example:
# cimage --push-rack compute-rhel55 r\*
You can then use the cimage --set operation to associate compute blades with the new image.
Reboot or reset the compute nodes associated with the new image.

Power Management Commands

The cpower command allows you to power up, power down, reset, and show the power status of system components.

`cpower` Command

The cpower command is, as follows:

cpower [<option> ...] [<target_type>] [<action>] <target>

The <option> argument can be one or more of the following:

Option		Description
`--noleader`		Do not include leader nodes (valid with rack and system domains only).
`--noservice`		Do not include service nodes (valid with system domain only).
`--force`		When using wildcards in the target, disable all “safety” checks. Make sure you really want to use this command.
`-n, --noexec`		Displays, but does not execute, commands that affect power.
`-v, --verbose`		Print additional information on command progress

The <target> argument is one of the following:

`--node`		Applies the action to nodes. Nodes are compute nodes, rack leader controllers (leader nodes), system admin controller (admin node), and service nodes. [default]
`--iru`		Applies the action at the IRU level.
`--rack`		Applies the action at the rack level.
`--system`		Applies the action to the system. You must not specify a target with this type.

The <action> argument is one of the following:

`--status`		Show the power status of the target, including whether it is booted or not. [default]
`--up \| --on`		Powers up the target.
`--down \| --off`		Powers down the target.
`--reset`		Performs a hard reset on the target.
`--cycle`		Power cycles the target.
`--boot`		Boots up the target, unless it is already booted. Waits for all targets to boot.
`--reboot`		Reboots the target, even if already booted. Wait for all targets to boot.
`--halt`		Halts and then powers off the target.
`--shutdown`		Shuts down the target, but does not power it off. Waits for targets to shut down.
`--identify <interval>`		Turns on the identifying LED for the specified interval in seconds. Uses an interval of 0 to turn off immediately.
`-h,` `--help`		Shows help usage statement.

The target must always be specified except when the --system option is used. Wildcards may be used, but be careful not to accidentally power off or reboot the leader nodes. If wildcard use affects any leader node, the command fails with an error.

Operations on Nodes

The default for the cpower command is to operate on system nodes, such as compute nodes, leader nodes, or service nodes. If you do not specify --iru, --rack, or --system, the command defaulted to operating as if you had specified --node.

Here are examples of node target names:

r1i3n10

Compute node at rack 1, IRU 3, slot 10
service0

Service node 0
r3lead

Rack leader controller (leader node) for rack 3
r1i*n*

Wildcards let you specify ranges of nodes, for example, r1i*n* all compute nodes in all IRUs on rack 1

IPMI-style Commands

The default operation for the cpower command is to operate on nodes and to provide you the status of these nodes, as follows:

# cpower r1i*n*

This command is equivalent to the following:

# cpower --node --status r1i*n*

This command issues an ipmitool power off command to all of the nodes specified by the wildcard, as follows:

# cpower --off r2i*n*

The default is to apply to a node.

The following commands behave exactly as you would expect as if you were using ipmitool, and have no special extra logic for ordering:

# cpower --up r1i*n*

# cpower --reset r1i*n*

# cpower --cycle r1i*n*

# cpower --identify 5 r1i*n*

Note: --up is a synonym for --on and --down is a synonym for --off.

IRU, Rack, and System Domains

The cpower command contains more logic when you go up to higher levels of abstraction, for example, using --iru, --rack, and --system. These higher level domain specifiers tell the command to be smart about how to order various of the actions that you give on the command line.

The --iru option tells the command to use correct ordering with IRU power commands. In this case, it firsts connect to the CMC on each IRU in rack 1 to issue the power on command, which turns on power to the IRU chassis (this is not the equivalent ipmitool command). Then it powers up the compute nodes in the IRU. Powering things down is the opposite, with the power to the IRU being turned off after power to the blades. IRU targets are specified as follows:r3i2 for rack 3, IRU 2.

# cpower --iru --up r1i*

The --rack option ensures power commands to the leader node are down in the correct order relative to compute nodes within a rack. First, it powers up the leader node and waits for it to boot up (if it is not already up). Then it will do the functional equivalent of a cpower --iru --up r4i* on each of the IRUs contained in the rack, including applying power to each IRU chassis. Using the --down option is the opposite, and also turns off the leader node (after doing a shutdown) after all the IRUs are powered down. To avoid including leader nodes in a power command for a rack, use the --noleader option. Rack targets are specified, as follows: r4 for rack 4. Here is an example:

# cpower --rack --up r4

Commands with the --system option ensures that power up commands are applied first to service nodes, then to leader nodes, then to IRUs and compute blades, in just the same way. Likewise, compute blades are powered down before IRUs, leader nodes, and service nodes, in that order. To avoid including service nodes in a system-domain command, use the --noservice option. Note that you must not specify a target with --system option, since it applies to the Altix ICE system.

Shutting Down and Booting

Note: The --shutdown --off combination of actions were deprecated in a previous release. Use the --halt option in its place.

It is useful to be able to shutdown a machine before turning off the power, in most cases. The following cpower options to enable you to do this: --halt, --boot, and --reboot. The --halt option allows you to shut down a node. The --reboot option ensures that a system is always rebooted, whereas --boot will only boot up a system if it is not already booted. Thus, --boot is useful for booting up compute blades that have failed to start.

You need to configure the order in which service nodes are booted up and shut down as part of the overall system power management process. This is done by setting a boot_order for each service node. Use the cadmin command to set the boot order for a service node, for example:

# cadmin --set-boot-order --node service0 2

The cpower --system --boot command boots up service nodes with a lower boot order, first. It then boots up service nodes with a higher boot order. The reverse is true when shutting down the system with cpower. For example, if service1 has a boot order of 3 and service2 has a boot order of 5, service1 is booted completely, and then service2 is booted, afterwards. During shutdown, service2 is shut down completely before service1 is shutdown.

There is a special meaning to a service node having a boot order of zero. This value causes the cpower --system command to skip that service node completely for both start up and shutdown (although not for status queries). Negative values for the service node boot order setting are not permitted.

Note: The IPMI power commands necessary to enable a system to boot (either with a power reset, or a power on) may be sent to a node. The --halt option, halts the target node and then powers it off.

The --halt options works on node, IRU, or rack domain levels. It will shut down nodes (in the correct order if you use the --iru or --rack options), and then just leave them as they are, power still applied. Using both these actions results in nodes being halted, then powered off. This is particularly useful when powering off a rack, since otherwise, the leaders may be shutdown before there is a chance to power off the compute blades. Here is an example:

# cpower --halt --rack r1

To boot up systems that have not already been booted, perform the following:

# cpower --boot  r1i2n*

Again, the command boots up nodes in the right orders if you specify the --iru or --rack options and the appropriate target. Otherwise, there is no guarantee that, for example, the command will attempt to power on the leader node before compute nodes in the same rack.

To reboot all of the nodes specified, or boot them if they are already shut down, perform the following:

# cpower --reboot --iru r3i3

The --iru or --rack options ensure proper ordering if you use them. In this case, the command will make sure that power is supplied to the chassis for rack 3, IRU 3, and then the all the compute nodes in that IRU will be rebooted.

EXAMPLES

Example 3-3. cpower Command Examples

To boot compute blade r1i0n8, perform the following:

# cpower --boot r1i0n8

To boot a number of compute blades at the same time, perform the following:

# cpower --boot --rack r1

Note: The --boot option will only boot those nodes that have not already booted.

To shut down service node 0, perform the following:

# cpower --halt service0

To shutdown and switch off everything in rack 3, perform the following:

# cpower --halt --rack r3

Note: This command will shutdown and then power off all of the computer nodes in parallel, then shutdown and power off the leader node. Use the --noleader option if you want the leader node to remain booted up.

To shutdown the entire system, including all service nodes and all leader nodes, but not the admin node, and not turn the power off to anything, perform the following:

# cpower --halt --system

To shutdown all the compute nodes, but not the service nodes, leader nodes, perform the following:

# cpower --halt --system --noleader --noservice

Note: The only way to shut down the system admin controller (admin node) is to perform the operation manually.

C3 Commands

Note: For legacy Altix ICE systems, this section remains intact. However, SGI recommends you use the pdsh and pdcp utilities described in “pdsh and pdcp Utilities”.

This section describes the cluster command and control (C3) tool suite for cluster administration and application support.

Note: The SMC for Altix ICE version of C3 does not include the cshutdown and cpushimage commands.

The C3 commands used on the the SGI Alitx ICE 8200 system are, as follows:

C3 Utilities		Description
`cexec`(s)		Executes a given command string on each node of a cluster
`cget`		Retrieves a specified file from each node of a cluster and places it into the specified target directory
`ckill`		Runs `kill` on each node of a cluster for a specified process name
`clist`		Lists the names and types of clusters in the cluster configuration file
`cnum`		Returns the node names specified by the range specified on the command line
`cname`		Returns the node positions specified by the node name given on the command line
`cpush`		Pushes files from the local machine to the nodes in your cluster

cexec is the most useful C3 utility. Use the cpower command rather than cshutdown (see “Power Management Commands”).

EXAMPLES

Example 3-4. C3 Command General Examples

The following examples walk you through some typical C3 command operations.

You can use the cname and cnum commands to map names to locations and vice versa, as follows:

# cname rack_1:0-2
local name for cluster:  rack_1
nodes from cluster:  rack_1
cluster:  rack_1 ; node name:  r1i0n0
cluster:  rack_1 ; node name:  r1i0n1
cluster:  rack_1 ; node name:  r1i0n10

# cnum rack_1: r1i0n0
local name for cluster:  rack_1
nodes from cluster:  rack_1
r1i0n0 is at index 0 in cluster rack_1

# cnum rack_1: r1i0n1
local name for cluster:  rack_1
nodes from cluster:  rack_1

You can use the clist command to retrieve the number of racks, as follows:

# clist
cluster  rack_1  is an indirect remote cluster
cluster  rack_2  is an indirect remote cluster
cluster  rack_3  is an indirect remote cluster
cluster  rack_4  is an indirect remote cluster

You can use the cexec command to view the addressing scheme of the C3 utility, as follows:

# cexec rack_1:1 hostname
************************* rack_1 *************************
************************* rack_1 *************************
--------- r1i0n1---------
r1i0n1

# cexec rack_1:2-3 rack_4:0-3,10 hostname
************************* rack_1 *************************
************************* rack_1 *************************
--------- r1i0n10---------
r1i0n10
--------- r1i0n11---------
r1i0n11
************************* rack_4 *************************
************************* rack_4 *************************
--------- r4i0n0---------
r4i0n0
--------- r4i0n1---------
r4i0n1
--------- r4i0n10---------
r4i0n10
--------- r4i0n11---------
r4i0n11
--------- r4i0n4---------
r4i0n4

The following set of command shows how to use the C3 commands to transverse the different levels of hierarchy in your Altix ICE system (for information on the hierarchical design of your Altix ICE system see “Basic System Building Blocks” in Chapter 1).

To execute a C3 command on all blades within the default Altix ICE system, for example, rack 1, perform the following:

# cexec hostname
************************* rack_1 *************************
************************* rack_1 *************************
--------- r1i0n0---------
r1i0n0
--------- r1i0n1---------
r1i0n1
--------- r1i0n10---------
r1i0n10
--------- r1i0n11---------
r1i0n11

...

To run a C3 command on all compute nodes across an Altix ICE system, perform the following:

# cexec --all hostname
************************* rack_1 *************************
************************* rack_1 *************************
--------- r1i0n0---------
r1i0n0
--------- r1i0n1---------
r1i0n1
...
--------- r2i0n10---------
r2i0n10
...
--------- r3i0n11---------
r3i0n11
...

To run a C3 command against the first rack leader controller, in the first rack, perform the following:

# cexec --head hostname
************************* rack_1 *************************
--------- rack_1---------
r1lead

To run a C3 command against all rack leader controllers across all racks, perform the following:

# cexec --head --all hostname
************************* rack_1 *************************
--------- rack_1---------
r1lead
************************* rack_2 *************************
--------- rack_2---------
r2lead
************************* rack_3 *************************
--------- rack_3---------
r3lead
************************* rack_4 *************************
--------- rack_4---------
r4lead

The following set of examples shows some specific case uses for the C3 commands that you are likely to employ.

Example 3-5. C3 Command Specific Use Examples

From the system admin controller, run command on rack 1 without including the rack leader controller, as follows:

# cexec rack_1: <cmd>

Run a command on all service nodes only, as follows:

# cexec -f /etc/c3svc.conf <cmd>

Run a command on all compute nodes in the system, as follows:

# cexec --all <cmd>

Run a command on all rack leader controllers, as follows:

# cexec --all --head <cmd>

Run a command on blade 42 (compute node 42) in rack 2, as follows:

# cexec rack_2:42 <cmd>

From a service node over the InfiniBand Fabric, run a command on all blades (compute nodes) in the system, as follows:

# cexec --all <cmd>

Run a command on blade 42 (compute node 42), as follows:

# cexec blades:42 <cmd>

`pdsh` and `pdcp` Utilities

The pdsh(1) command is the parallel shell utility. The pdcp(1) command is the parallel copy/fetch utility. The SMC for Altix ICE software populates some dshgroups files for the various node types. On the admin node, SMC for Altix ICE software populates the leader and service groups files, which contain the list of online nodes in each of those groups.

On the leader node, software populates the compute group for all the online compute nodes in that group.

On the service node, software populates the compute group which contains all the online compute nodes in the whole system.

For more information, see the pdsh(1) and pdcp(1) man pages.

EXAMPLES

From the admin node, to run the hostname command on all the leader nodes, perform the following:

# pdsh -g leader hostname

To run the hostname command on all the compute nodes in the system, via the leader nodes, perform the following:

# pdsh -g leader pdsh -g compute hostname

To run the hostname command on just r1lead and r2lead, perform the following:

# pdsh -w r1lead,r2lead hostname

`cadmin`: SMC for Altix ICE Administrative Interface

The cadmin command allows you to change certain administrative parameters in the cluster such as the boot order of service nodes, the administrative status of nodes, and the adding, changing, and removal of IP addresses associated with service nodes.

To get the cadmin usage statement, perform the following:

# cadmin --h
cadmin: SMC for Altix ICE Administrative Interface
Help:

In general, these commands operate on {node}.  {node} is the SMC for Altix ICE style
node name.  For example, service0, r1lead, r1i0n0.  Even when the host name
for a service node is changed, the SMC for Altix ICE name for that node may still be used
for {node} below.  The node name can either be the SMC for Altix ICE unique node name
or a customer-supplied host name associated with a SMC for Altix ICE unique node name.

--version : Display current release information
--set-admin-status --node {node} {value}  : Set Administrative Status
--show-admin-status --node {node} : Show Administrative Status
--set-boot-order --node {node} [value] : Set boot order [1]
--show-boot-order --node {node} : Show boot order [1]
--set-ip --node {node} --net {net} {hostname}={ip} : Change an allocated ip [1]
--del-ip --node {node} --net {net} {hostname}={ip} : Delete an ip [1]
--add-ip --node {node} --net {net} {hostname}={ip} : allocate a new ip [1]
--show-ips --node {node} : Show all allocated IPs associated with node
--set-hostname --node {node} {new-hostname} : change the host name [5]
--show-hostname --node {node} : show the current host name for ice node {node}
--set-subdomain {domain} : Set the cluster subdomain [3]
--show-subdomain : Show the cluster subdomain
--set-admin-domain {domain} : Set the admin node house network domain
--show-admin-domain : Show the admin node house network domain
--db-purge --node {node} : Purge service or lead node (incl entire rack) from DB
--set-external-dns --ip {ip} : Set IP addr(s) of external DNS master(s) [4]
--show-external-dns : Show the IP addr(s) of the external DNS master(s)
--del-external-dns : Delete the configuration of external DNS master(s)
--show-root-labels : Show grub root labels if multiple roots are in use
--set-root-label --slot {#} --label {label} : Set changeable part of root label
--show-default-root : Show default root if multiple roots are in use
--set-default-root --slot {#} : Set the default slot if multiple roots in use
--show-current-root : Show current root slot
--enable-auto-recovery : Enable ability for nodes to recover themselves [6]
--disable-auto-recovery : Disable auto recovery [6]
--show-auto-recovery : Show the current state of node auto recovery [6]
--set-redundant-mgmt-network --node {node} {value}:  Configure network
    management redundancy; valid values are "yes" and "no".
--show-redundant-mgmt-network --node {node}:  Show current value.
--show-dhcp-option: Show admin dhcp option code used to distinguish mgmt network
--set-dhcp-option {value}: Set admin dhcp option code

Node-attribute options:
--add-attribute [--string-data "{string}"] [--int-data {int}] {attribute-name}
--is-attribute {attribute-name}
--delete-attribute {attribute-name}
--set-attribute-data [--string-data "{string}"] [--int-data {int}]
  {attribute-name}
--get-attribute-data {attribute-name}
--search-attributes [--string-data "{string|regex}"] [--int-data {int}]
--add-node-attribute [--string-data "{string}"] [--int-data {int}]
  --node {node} --attribute {attribute-name}
--is-node-attribute --node {node} --attribute {attribute-name}
--delete-node-attribute --node {node} --attribute {attribute-name}
--set-node-attribute-data [--string-data "{string}"] [--int-data {int}]
  --node {node} --attribute {attribute-name}
--get-node-attribute-data --node {node} --attribute {attribute-name}
--search-node-attributes [--node {node}] [--attribute {attribute-name}]
  [--string-data "{string|regex}"] [--int-data {int}]

Descriptions of Selected Values:
{hostname}={ip} means specify the host name associated with the specified
ip address.
{net} is the SMC for Altix ICE network to change such as ib-0, ib-1, head, gbe, bmc, etc
{node} is a SMC for Altix ICE-style node name such as r1lead, service0, or r1i0n0.
[1] Only applies to service nodes
[2] This operation may require the cluster to be fully shut down and AC power
    to be removed.  IPs will have to be re-allocated to fit in the new range.
[3] All cluster nodes will have to be reset
[4] Use quoted, semi-colon separated list if more than one master
[5] Only applies to admin and service nodes
[6] Auto recovery will allow service and leader nodes to boot in to a special
    recovery mode if the cluster doesn't recognize them.  This is enabled by
    default and would be used, for example, if a node's main board was replaced
    but the original system disks were imported from the original system.

EXAMPLES

Example 3-6. SMC for Altix ICE Administrative Interface (cadmin) Command

Set a node offline, as follows:

# cadmin --set-admin-status --node r1i0n0 offline

Set a node online, as follows:

# cadmin --set-admin-status --node r1i0n0 online

Set the boot order for a service node, as follows:

# cadmin --set-boot-order --node service0 2

Add an IP to an existing service node, as follows:

# cadmin --add-ip --node service0 --net ib-0 my-new-ib0-ip=10.148.0.200

Change the SMC for Altix ICE needed service0-ib0 IP address, as follows:

# cadmin --set-ip --node service0 --net head service0=172.23.0.199

Show currently allocated IP addresses for service0, as follows:

# cadmin --show-ips --node service0
IP Address Information for SMC for Altix ICE node: service0

ifname        ip               Network  

myservice-bmc 172.24.0.3       head-bmc 
myservice     172.23.0.3       head     
myservice-ib0 10.148.0.254     ib-0     
myservice-ib1 10.149.0.67      ib-1     
myhost        172.24.0.55      head-bmc 
myhost2       172.24.0.56      head-bmc 
myhost3       172.24.0.57      head-bmc

Delete a site-added IP address (you cannot delete SMC for Altix ICE needed IP addresses), as follows:

admin:~ # cadmin --del-ip --node service0 --net ib-0 my-new-ib0-2-ip=10.148.0.201

Change the hostname associated with service0 to be myservice, as follows:

admin:~ # cadmin --set-hostname --node service0 myservice

Change the hostname associated with admin to be newname, as follows:

admin:~ # cadmin --set-hostname --node admin newname

Set and show the cluster subdomain, as follows:

admin:~ # cadmin --set-subdomain mysubdomain.domain.mycompany.com
admin:~ # cadmin --show-subdomain
The cluster subdomain is: mysubdomain

Show the admin node house network domain, as follows:

admin:~ # cadmin --show-admin-domain
The admin node house network domain is: domain.mycompany.com

Show the SMC for Altix ICE systems DHCP option identifier, as follows:

admin:~ # cadmin  --show-dhcp-option
149

Console Management

SMC for Altix ICE management systems software uses the open-source console management package called conserver. For detailed information on consever, see http://www.conserver.com/

An overview of the conserver package is, as follows:

Manages the console devices of all managed nodes in an Altix ICE system
A conserver daemon runs on the system admin controller (admin node) and the rack leader controllers (leader nodes). The system admin controller manages leader and service node consoles. The rack leader controllers manage blade consoles.
The conserver daemon connects to the consoles using ipmitool. Users connect to the daemon to access them. Multiple users can connect but non-primary users are read-only.
The conserver package is configured to allow all consoles to be accessed from the system admin controller.
All consoles are logged. These logs can be found at /var/log/consoles on the system admin controller and rack leader controllers. An autofs configuration file is created to allow you to access rack leader controller managed console logs from the system admin controller, as follows:
admin # cd /net/r1lead/var/log/consoles/

The /etc/conserver.cf file is the configuration file for the conserver daemon. This file is generated for both the system admin controller and rack leader controllers from the /opt/sgi/sbin/generate-conserver-files script on the system admin controller. This script is called from discover-rack command as part of rack discovery or rediscovery and generates both the conserver.cf file for the rack in question and regenerates the conserver.cf for the sysem admin controller.

Note: The conserver package replaces cconsole for access to all consoles (blades, leader nodes, managed service nodes)

You may find the following conserver man pages useful:

Man Page		Description
`console(1)`		Console server client program
`conserver(8)`		Console server daemon
`conserver.cf(5)`		Console configuration file for `conserver`(8)
`conserver.passwd(5)`		User access information for `conserver`(8)

Procedure 3-7. Using conserver Console Manager

To use the conserver console manager, perform the following steps:

To see the list of available consoles, perform the following:

admin:~ # console -x
 service0                 on /dev/pts/2                       at  Local 
 r2lead                   on /dev/pts/1                       at  Local 
 r1lead                   on /dev/pts/0                       at  Local 
 r1i0n8                   on /dev/pts/0                       at  Local 
 r1i0n0                   on /dev/pts/1                       at  Local

To connect to the service console, perform the following:

admin:~ # console service0
[Enter `^Ec?' for help]


Welcome to SUSE Linux Enterprise Server 10 sp2 (x86_64) - Kernel 2.6.16.60-0.12-smp (ttyS1).


service0 login:

To connect to the rack leader controller console, perform the following:

admin:~ # console r1lead
[Enter `^Ec?' for help]


Welcome to SUSE Linux Enterprise Server 10 sp2 (x86_64) 
- Kernel 2.6.16.60-0.12-smp (ttyS1).


r1lead login:

To trigger system request commands sysrq (once connected to a console), perform the following:

Ctrl-e c l 1 8               # set log level to 8
Ctrl-e c l 1 <sysrq cmd>            # send sysrq command

To see the list of conserver escape keys, perform the following:
Ctrl-e c ?

Keeping System Time Synchronized

The SMC for Altix ICE systems management software uses network time protocol (NTP) as the primary mechanism to keep the nodes in your Altix ICE system synchronized. This section describes this mechanism operates on the various Altix ICE components and covers these topics:

System Admin Controller NTP

When you used the configure-cluster command, it guided you through setting up NTP on the admin node. The NTP client on the system admin controller should point to the house network time server. The NTP server provides NTP service to system components so that nodes can consult it when they are booted. The system admin controller sends NTP broadcasts to some networks to keep the nodes in sync after they have booted.

Rack Leader Controller NTP

NTP client on the rack leader controller gets time from the system admin controller when it is booted and then stays in sync by connecting to the admin node for time. The NTP server on the leader node provides NTP service to Altix ICE components so that compute nodes can sync their time when they are booted. The rack leader controller sends NTP broadcasts to some networks to keep the compute nodes in sync after they have booted.

Managed Service, Compute, and Leader BMC Setup with NTP

The BMC controllers on managed service nodes, compute nodes, and leader nodes are also kept in sync with NTP. Note that you may need the latest BMC firmware for the BMCs to sync with NTP properly. The NTP server information for BMCs is provided by special options stored in the DHCP server configuration file.

Service Node NTP

The NTP client on managed service nodes ( for a definition of managed, see “discover Command” in Chapter 2) sets its time at initial booting from the system admin controller. It listens to NTP broadcasts from the system admin controller to stay in sync. It does not provide any NTP service.

Compute Node NTP

The NTP Client on the compute node sets its time at initial booting from the rack leader controller. It listens to NTP broadcasts from the rack leader controller to stay in sync.

NTP Work Arounds

Sometime, especially during initial deployment of an Altix ICE system when system components are being installed and configured for the first time, NTP is not available to serve time to system components.

A non-modified NTP server, running for the first time, takes quite some time before it offers service. This means the leader and service nodes may fail to get time from the system admin controller as they come on-line. Compute nodes may also fail to get time from the leader when they first come up. This situation usually only happens at first deployment. After the ntp servers have a chance to create their drift files, ntp servers offer time with far less delay on subsequent reboots.

The following work arounds are in place for situations when NTP can not serve the time:

The admin and rack leader controllers have the time service enabled (xinetd).
All system node types have the netdate command.
A special startup script is on leader, service, and compute nodes that runs before the NTP startup script.

This script attempts to get the time using the ntpdate command. If the ntpdate command fails because the NTP server it is using is not ready yet to offer time service, it uses the netdate command to get the clock "close".

The ntp startup script starts the NTP service as normal. Since the clock is known to be "close", NTP will fix the time when the NTP servers start offering time service.

Changing the Size of `/tmp` on Compute Nodes

This section describes how to change the size of /tmp on Altix ICE compute nodes.

Procedure 3-8. Increasing the /tmp Size

To change the size of /tmp on your system compute nodes, perform the following steps:

From the admin node, change directory (cd) to /opt/sgi/share/per-host-customization/global.

Open the sgi-fstab file and change the size= parameter for the /tmp mount in both locations that it appears.

#!/bin/sh                                                                   
#                                                                           
# Copyright (c) 2007,2008 Silicon Graphics, Inc.                            
# All rights reserved.                                                      
#                                                                           
#  This program is free software; you can redistribute it and/or modify     
#  it under the terms of the GNU General Public License as published by     
#  the Free Software Foundation; either version 2 of the License, or        
#  (at your option) any later version.                                      
#                                                                           
#  This program is distributed in the hope that it will be useful,          
#  but WITHOUT ANY WARRANTY; without even the implied warranty of           
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the            
#  GNU General Public License for more details.                             
#                                                                           
#  You should have received a copy of the GNU General Public License        
#  along with this program; if not, write to the Free Software              
#  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA 
#                                                                           
# Set up the compute node's /etc/fstab file.                                
#                                                                           
# Modify per your sites requirements.                                       
#                                                                           
# This script is excecuted once per-host as part of the install-image operation
# run on the leader nodes, which is called from cimage on the admin node.      
# The full path to the per-host iru+slot directory is passed in as $1,         
# e.g. /var/lib/sgi/per-host/<imagename>/i2n11.                                
#                                                                              

# sanity checks
. /opt/sgi/share/per-host-customization/global/sanity.sh

iruslot=$1
os=( $(/opt/oscar/scripts/distro-query -i ${iruslot} | sed -n '/^compat /s/^compat.*: //p') )

compatdistro=${os[0]}${os[1]}                                                   

if [ ${compatdistro} = "sles10" -o ${compatdistro} = "sles11" ]; then

        #
        # SLES 10 compatible
        #
        cat <<EOF >${iruslot}/etc/fstab
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
tmpfs           /tmp            tmpfs   size=150m       0       0
EOF

elif [ ${compatdistro} = "rhel5"  ]; then

        #
        # RHEL 5 compatible
        #

        #
        # RHEL expects several subsys directories to be present under /var/run
        # and /var/lock, hence no tmpfs mounts for them
        #
        cat <<EOF >${iruslot}/etc/fstab
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
tmpfs           /tmp            tmpfs   size=150m       0       0
devpts          /dev/pts        devpts  gid=5,mode=620  0       0
EOF

else

        echo -e "\t$(basename ${0}): Unhandled OS.  Doing nothing"

fi

Push the image out to the racks to pick up the change, as follows:
# cimage --push-rack mynewimage r\*
For more information on using the cimage command, see “cimage Command”.

Enabling or Disabling the Compute Node iSCSI Swap Device

This section describes how to enable or disable the Internet small computer system interface (iSCSI) compute node swap device. The iSCSI compute node swap device is turned off by default for new installations. It can cause problems during rack-wide out of memory (OOM) conditions, with both compute nodes and the rack leader controller (RLC) becoming unresponsive during the heavy write-out to the per-node iSCSI swap devices.

Procedure 3-9. Enabling the iSCSI Swap Device

If you wish to enable the iSCSI swap device in a given compute node image, perform the following steps:

Change root (chroot) into the compute node image on the admin node and enable the iscsiswap service, as follows:
# chroot /var/lib/systemimager/images/compute-sles11 chkconfig iscsiswap on
Then, push the image out to the racks, as follows:
# cimage --push-rack compute-sles11 r\*

Procedure 3-10. Disabling the iSCSI Swap Device

To disable the iSCSI swap device in a compute node image where it is currently enabled, perform the following steps:

Disable the service, as follows:

# chroot /var/lib/systemimager/images/compute-sles11 chkconfig iscsiswap off

Then, push the image out to the racks, as follows:
# cimage --push-rack compute-sles11 r\*

Changing the Size of Per-node Swap Space

This section describes how to change per-node swap space on your SGI Altix ICE system.

Procedure 3-11. Increasing Per-node Swap Space

To increase the default size of the per-blade swap space on your system, perform the following:

Shutdown all blades in the affected rack (see “Shutting Down and Booting”).
Log into the leader node for the rack in question. (Note that you need to do this on each rack leader).
Change directory (cd) to the /var/lib/sgi/swapfiles directory.

To adjust the swap space size appropriate for your site, run a script similar to the following:

#!/bin/bash

size=262144     # size in KB

for i in $(seq 0 3); do
        for n in $(seq 0 15); do
                dd if=/dev/zero of=i${i}n${n} bs=1k count=${size}
                mkswap i${i}n${n}
        done
done

Reboot the all blades in the affected rack (see “Shutting Down and Booting”).

From the rack leader node, use the cexec --all command to run the free (1) command on the compute blades to view the new swap sizes, as follows:

r1lead:~ # cexec --all free
************************* rack_1 *************************
--------- r1i0n0---------
             total       used       free     shared    buffers     cached
Mem:       2060140     206768    1853372          0          4      46256
-/+ buffers/cache:     160508    1899632
Swap:        49144          0      49144
--------- r1i0n1---------
             total       used       free     shared    buffers     cached
Mem:       2060140     137848    1922292          0          4      44200
-/+ buffers/cache:      93644    1966496
Swap:        49144          0      49144
--------- r1i0n8---------
             total       used       free     shared    buffers     cached
Mem:       2060140     138076    1922064          0          4      43172
-/+ buffers/cache:      94900    1965240
Swap:        49144          0      49144

If you want change per-node swap space across your entire system, all (new) leaders nodes as part of discovery, you can edit the /etc/opt/sgi/conf.d/35-compute-swapfiles “inside” the lead-sles11 image on the admin node. The images are in the /var/lib/systemimager/images directory. For more information on customizing these images, see “Customizing Software Images”.

Switching Compute Nodes to a `tmpfs` Root

This section describes how to switch your system compute nodes to a tmpfs root.

Procedure 3-12. Switching Compute Nodes to a tmpfs Root

To switch your compute nodes to a tmpfs root, from the system admin controller (admin node) perform the following steps:

To switch compute nodes to a tmpfs root, use the optional --tmpfs flag to the cimage --set command, for example:

adminadmin:~ # cimage --set --tmpfs compute-sles11 2.6.27.19-5-smp r1i0n0

Note: To use a /tmpfs root with the standard compute node image, the compute node needs to have 4GB of memory or above. A standard /tmpfs mount has access to half the system memory, and the standard compute node image is just over 1 GB in size.

You can view the current setting of a compute node, as follows:

admin:~ # cimage --list-nodes r1i0n0
r1i0n0: compute-sles11 2.6.27.19-5-smp tmpfs

To set it back to an NFS root, use the --nfs flag to the cimage --set command, as follows:
admin:~ # cimage --set --nfs compute-sles11 2.6.27.19-5-smp r1i0n0
You can change the view the change back to NFS root, as follows:
admin:~ # cimage --list-nodes r1i0n0 r1i0n0: compute-sles11 2.6.27.19-5-smp nfs
For help information, use the cimage --h option.

Setting up Local Storage Space for Swap and Scratch Disk Space

The SGI Altix ICE 8400 system has the option to support local storage space on compute nodes (also known as blades). Solid-state drive (SSD) devices and 2.5" disks are available for this purpose. You can define the size and status for both swap and scratch partitions. The values can be set globally or per node or group of nodes. By default, the disks are partitioned only if blank, the swap is off, and the scratch is set to occupy the whole disk space and be mounted in /tmp/scratch.

The /etc/init.d/set-swap-scratch script is responsible for auto-configuring the swap and scratch space based on the settings retrieved via the cattr command. You can use the cadmin to configure settings globally or you can use the cattr command to set custom values for specific nodes.

The /etc/opt/sgi/conf.d/30-set-swap-scratch script makes sure /etc/init.d/swapscratch service is on so that swap/scratch partitions are configured directly after booting. The swapscratch service calls the /opt/sgi/lib/set-swap-scratch script when the service is started and then it exits.

You can customize the following settings:

blade_disk_allow_partitioning

The default value is "on" which means that the set-swap-scratch script will repartition and format the local storage disk if needed.

Note: To protect user data, the script will not re-partition the disk if it is already partitioned. In this case, you need a blank disk before it can be used for swap/scratch.

The set-swap-scratch script uses the following command to retrieve the blade_disk_allow_partitioning value for the node on which it is running:
# cattr get blade_disk_allow_partitioning -N $compute_node_name --default on
You can globally set the value on, as follows:
# cadmin --add-attribute --string-data on blade_disk_allow_partitioning
You can globally turn if off, as follows:
# cadmin --add-attribute --string-data off blade_disk_allow_partitioning
blade_disk_swap_status

The default value is "off" which means that the set-swap-scratch script will not enable a swap partition on the local storage disk.

The set-swap-scratch script uses the following command to retrieve the blade_disk_swap_status value for the node on which it is running:
# cattr get blade_disk_swap_status -N $compute_node_name --default off
You can globally set the value on, as follows:
# cadmin --add-attribute --string-data on blade_disk_swap_status
You can globally turn if off, as follows:
# cadmin --add-attribute --string-data off blade_disk_swap_status
The set-swap-scratch script uses SGI_SWAP label when partitioning the disk. It enables the swap only if it finds a partition labeled SGI_SWAP.
blade_disk_swap_size

The default value is 0 which means that the set-swap-scratch script will not create a swap partition on the local storage disk.

The set-swap-scratch script uses the following command to retrieve the blade_disk_swap_size value for the node on which it is running:
attr get blade_disk_swap_size -N $compute_node_name --default 0
You can globally set the value, as follows:
# cadmin --add-attribute --string-data 1024 blade_disk_swap_size
The size is specified in megabytes. Allowed values are, as follows: 0, -0 (use all free space when partitioning), 1, 2, ...
blade_disk_scratch_status

The default value is "off" which means that the set-swap-scratch script will not enable the scratch partition on the local storage disk.

The set-swap-scratch script uses the following command to retrieve the blade_disk_scratch_status value for the node on which it is running:
cattr get blade_disk_scratch_status -N $compute_node_name --default off
You can globally set the value on, as follows:
# cadmin --add-attribute --string-data on blade_disk_scratch_status
You can globally turn if off, as follows:
cadmin --add-attribute --string-data off blade_disk_scratch_status
Note: The set-swap-scratch script uses the SGI_SCRATCH label when partitioning the disk. It mounts the scratch only on the partition labeled as SGI_SCRATCH.
blade_disk_scratch_size

The default value is -0 which means that the set-swap-scratch script will use all remaining free space when creating the scratch partition.

The set-swap-scratch script uses the following command to retrieve the blade_disk_scratch_size value for the node on which it is running:
cattr get blade_disk_scratch_size -N $compute_node_name --default -0
You can globally set the value, as follows:
cadmin --add-attribute --string-data 10240 blade_disk_scratch_size
The size is specified in megabytes. Allowed values are, as follows: 0, -0 (use all free space when partitioning), 1, 2, ...
blade_disk_scratch_mount_point

The default value is /tmp/scratch which means that the set-swap-scratch script will mount the scratch partition in /tmp/scratch.

The set-swap-scratch script uses the following command to retrieve the blade_disk_scratch_size value for the node on which it is running:
# cattr get blade_disk_scratch_mount_point -N $compute_node_name --default /tmp/scratch
You can globally set the value, as follows:
# cadmin --add-attribute --string-data /tmp/scratch blade_disk_scratch_mount_point
You can mount the disk to any mount point you desire. The set-swap-scratch script will create that folder if it does not exists (as long as the script has the permission to create it at that path). The root mount point (/) is not writable on the compute nodes. You need to create that folder as part of the compute node image if you want to mount something like /scratch.

For a cattr command help statement, perform the following command:

# cattr -h
Usage:
  cattr [--help] COMMAND [ARG]...

Commands:
  exists   check for the existence of an attribute
  get      print the value of an attribute
  list     print a list of attribute values
  set      set the value of an attribute
  unset    delete the value of an attribute

For more detailed help, use 'cattr COMMAND --help'.

Viewing the Compute Node Read-Write Quotas

This section describes how to view the per compute node read and write quota.

Procedure 3-13. Viewing the Compute Node Read-Write Quotas

To view the per compute node read and write quota, log onto the leader node and perform the following:

r1lead:~ # xfs_quota -x -c 'quota -ph 1'
Disk quotas for Project #1 (1)
Filesystem   Blocks  Quota  Limit Warn/Time    Mounted on
/dev/disk/by-label/sgiroot
              64.6M      0     1G  00 [------] /

Map the XFS project ID to the quota you are interested in by looking it up in /etc/projects file.

If you decided to change the xfs_quota values, log back onto the admin node and edit the /etc/opt/sgi/cminfo file inside the compute image where you want to change the value, for example, /var/lib/systemimager/images/ image_name. Change the value of the PER_BLADE_QUOTA variable and then repush the image with the following command:

# cimage --push-rack image_name racks

For help information, perform the following:

xfs_quota> help
df [-bir] [-hn] [-f file] -- show free and used counts for blocks and inodes
help [command] -- help for one or all commands
print -- list known mount points and projects
quit -- exit the program
quota [-bir] [-gpu] [-hnv] [-f file] [id|name]... -- show usage and limits

Use 'help commandname' for extended help

Use help commandname for extended help, such as the following:

xfs_quota> help quota

quota [-bir] [-gpu] [-hnv] [-f file] [id|name]... -- show usage and limits

 display usage and quota information

 -g -- display group quota information
 -p -- display project quota information
 -u -- display user quota information
 -b -- display number of blocks used
 -i -- display number of inodes used
 -r -- display number of realtime blocks used
 -h -- report in a human-readable format
 -n -- skip identifier-to-name translations, just report IDs
 -N -- suppress the initial header
 -v -- increase verbosity in reporting (also dumps zero values)
 -f -- send output to a file
 The (optional) user/group/project can be specified either by name or by
 number (i.e. uid/gid/projid).

xfs_quota>

RAID Utility

The infrastructure nodes on your Altix ICE system have LSI RAID enabled by default from the factory. A lsiutil command-line utility is included with the installation for the admin node, the leader node, and the service node (when installed from the SGI service node image). This tool allows you to look at the devices connected to the RAID controller and manage them. Some functions, such as, setting up mirrored or striped volumes, can be handled either by the LSI BIOS configuration tool or the lsiutil utility.

Note: These instructions only apply to Altix XE250 or Altix XE270 systems with the 1068-based controller. They do not apply to Altix XE250 or Altix XE270 systems that have the LSI Megaraid controller.

Example 3-7. Using the lsiutil Utility

The following lsiutil command-line utility example shows a sample session, as follows:

Start the lsiutil tool, as follows:

admin:~ # lsiutil

LSI Logic MPT Configuration Utility, Version 1.54, January 22, 2008

1 MPT Port found

     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  /proc/mpt/ioc0    LSI Logic SAS1068E B2     105      01140100     0

Select a device:  [1-1 or 0 to quit]

Select 1 to show the MPT Port , as follows:

1 MPT Port found

     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  /proc/mpt/ioc0    LSI Logic SAS1068E B2     105      01140100     0

Select a device:  [1-1 or 0 to quit] 1

 1.  Identify firmware, BIOS, and/or FCode
 2.  Download firmware (update the FLASH)
 4.  Download/erase BIOS and/or FCode (update the FLASH)
 8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
60.  Show non-default settings
61.  Restore default settings
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
 e   Enable expert mode in menus
 p   Enable paged mode in menus
 w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit]

Choose 21. RAID actions, as follows:

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 21

 1.  Show volumes
 2.  Show physical disks
 3.  Get volume state
 4.  Wait for volume resync to complete
23.  Replace physical disk
26.  Disable drive firmware update mode
27.  Enable drive firmware update mode
30.  Create volume
31.  Delete volume
32.  Change volume settings
50.  Create hot spare
99.  Reset port
 e   Enable expert mode in menus
 p   Enable paged mode in menus
 w   Enable logging

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit]

Choose 2. Show physical disks, to show the status of the disks making up the volume, as follows:

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 2

1 volume is active, 2 physical disks are active

PhysDisk 0 is Bus 0 Target 1
  PhysDisk State:  online
  PhysDisk Size 238475 MB, Inquiry Data:  ATA      Hitachi HDT72502 A73A

PhysDisk 1 is Bus 0 Target 2
  PhysDisk State:  online
  PhysDisk Size 238475 MB, Inquiry Data:  ATA      Hitachi HDT72502 A73A

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit]

Choose 1. Show volumes, to show information about the volume including its health, as follows:

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 1

1 volume is active, 2 physical disks are active

Volume 0 is Bus 0 Target 0, Type IM (Integrated Mirroring)
  Volume Name:                                  
  Volume WWID:  09195c6d31688623
  Volume State:  optimal, enabled
  Volume Settings:  write caching disabled, auto configure
  Volume draws from Hot Spare Pools:  0
  Volume Size 237464 MB, 2 Members
  Primary is PhysDisk 1 (Bus 0 Target 2)
  Secondary is PhysDisk 0 (Bus 0 Target 1)

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit]

Restoring the `grub` Boot-loader on a Node

When grub(8) boot-loader is not written to the rack leader controllers (leader nodes) or any of the system service nodes or is not functioning correctly, the grub boot-loader will have to be re-installed on the master boot record (MBR) of the root drive for the node.

To rewrite grub to the MBR of the root drive on a system that is booted, issue the following grub commands:

# grub
grub>    root (hd0,0)
grub>    setup (hd0)
grub>    quit

If you cannot boot your system (and it is hanging on grub), you need to boot the node in rescue mode and then issue the following commands:

# mount /dev/ /system
# mount -o bind /dev /system/dev
# mount -t proc proc /system/proc # optional
# mount -t sysfs sysfs /system/sys # optional
# chroot /system
# grub
grub>    root (hd0,0)
grub>    setup (hd0)
grub>    quit
# reboot

Backing up and Restoring the System Database

The SMC for Altix ICE systems management software captures the relevant data for the managed objects in an SGI Altix ICE system. Managed objects are the hierarchy of nodes described in “Basic System Building Blocks” in Chapter 1. The system database is critical to the operation of your SGI Altix ICE system and you need to back up the database on a regular basis.

Managed objects on an SGI Altix ICE include the following

Altix ICE system

One ICE system is modeled as a meta-cluster. This meta-cluster contains the racks each modeled as a sub-cluster.
Nodes

System admin controller (admin node), rack leader controllers (leader nodes), service nodes, compute nodes (blades) and chassis management control blades (CMCs) are modeled as nodes.
Networks

The preconfigured and potentially customized IP networks
Nics

The network interfaces for Ethernet and InfiniBand adapters.
The network interfaces for Ethernet and InfiniBand adapter.

The node images installed on each particular node.

SGI recommends that you keep three backups of your system database at any given time. You should implement a rotating backup procedure following the son-father-grandfather principle.

Procedure 3-14. Backing up and Restoring the System Database

To back up and restore the system database, perform the following steps:

Note: A password is required to use the mysqldump command. The password file is located in the /etc/odapw file.

From the system admin controller, to back up the system database perform a command similar to the following:
# mysqldump --opt oscar > backup-file.sql
To read the dump file back into the system admin controller, perform a command similar to the following:
# mysql oscar < backup-file.sql

For more information, see the mysqldump(1) man page.

Enabling EDNS

Extension mechanisms for DNS (EDNS) can cause excessive logging activity when not working properly. SMC on Altix ICE contains code to limit EDNS logging. This section describes how to delete this code and allow EDNS to work unrestricted and log messages.

Procedure 3-15. Enabling EDNS

To enable EDNS on your Altix ICE system, perform the following steps:

Open the /opt/sgi/lib/Tempo/Named.pm file with your favorite editing tool.
To remove the limit on the edns_udp_size parameter, comment out or remove the following line:
$limit_edns_udp_size = "edns-udp-size 512;";"

Remove the following lines so that EDNS logging is no longer disabled:

logging {
category lame-servers {null; };
category edns-disabled { null; };  };

Firmware Management

The fwmgr tool and its associated libraries form a firmware update framework. This framework makes managing the various firmware types in a cluster easier.

A given cluster may have several types of firmware including mainboard BIOS, BMC, disk controllers, InfiniBand (ib) interfaces, /Ethernet NICs, network switches, and many other types.

The firmware management tools allow the firmware to be stored in a central location (firmware bundle library) to be accessed by command line or graphical tools. The tools allow you to add firmware to the library, remove firmware from the library, install firmware on a given set of nodes, and other related operations.

License Requirement

This framework is licensed. It cannot be used without the appropriate license.

Terminology

This section describes some terminology associated with the firmware management, as follows:

Raw firmware file

These are files that you download, likely from SGI, that include the firmware and option tools to flash said firmware. For example, a raw firmware file for an Altix ICE compute node BIOS update might be downloaded as, sgi-ice-blade-bios-2009.12.14-1.x86_64.rpm.
Firmware bundle

A firmware bundle is a file that contains the firmware to be flashed in a way that the integrated tools understand. Normally, firmware bundles are stored in the firmware bundle library (see below). However, these bundles can also be checked out of the library and accessed directly in some cases. In most situations, a firmware bundle is a sort of wrapper around the raw firmware file(s) and various attributes and tools. A firmware bundle can contain more than one type of firmware. This is the case when the underlying flash tool supports more than one firmware type. An example of this is the SGI ICE compute node firmware, that contains several different BIOS files for different mainboards and multiple BMC firmware revisions. Another example might be a raw file that includes both the BIOS and BMC firmware for a given mainboard/server.
Firmware bundle library

This is a storage repository for firmware bundles. The management tools allow you to query the library for available bundles and associated attributes.
Update environment

Some raw firmware types, like the various Altix ICE firmware released as RPMs, run "live" on the admin node to facilitate flashing. The underlying tool may indeed set nodes up to network boot a low level flash tool, but there are many other methods used by the underlying tools. Some firmware types, like BIOS ROMs with associated flash executables, require an update environment to be constructed. One type of update environment is a DOS Update Environment. This update environment may be used, for example, to construct a DOS boot image for the BIOS ROM and associated flash tool. A firmware bundle calls for a specific update environment. In this way, a firmware bundle with an associated update environment form the necessary pieces to facilitate booting of a DOS update environment over the network that flashes the target nodes with the specified BIOS ROM (as an example).

Firmware Update High Level Example

This section describes the steps you need to take to update a set of nodes in your cluster with a new BIOS level, as follows:

Download the raw firmware file for this system type. You might do this, for example, from SGI Supportfolio web site located at https://support.sgi.com/login.
Add the raw firmware file to the firmware bundle library using a graphical or command line tool.
The tool will convert the raw firmware file into a firmware bundle and store it in the firmware bundle library. In some cases, you will be required to provide additional information in order to convert the raw firmware file into a firmware bundle. This could be information necessary to facilitate flashing that the framework can not derive from the file on its own.
Once the firmware bundle is available in the firmware library, you can use the graphical or command line tool to select a firmware bundle and a list of target nodes to which to push the firmware update.
The underlying tool then creates the appropriate update environment (if required) and facilitates flashing of the nodes.

Firmwware Manager Command Line Interface (`fwmgr`)

The fwmgr command is the command line interface (CLI) to the firmware update infrastructure.

For a usage statement, enter fwmgr --help. The fwmgr command has several sub-commands, each of which can be called with the --help option for usage information.

You can use the fwmgr command to perform the following:

List the available firmware bundles
Add raw firmware files or firmware bundle files to the firmware bundle library. If it is a raw firmware type, it will be converted to a firmware bundle and placed in the library.
Remove firmware bundles from the firmware bundle library
Rename an existing firmware bundle in the firmware bundle library
Install a given firmware bundle on to a list of nodes
Checkout a firmware bundle which allows you to store the firmware bundle itself

Note: It is currently not necessary to run the fwmgrd command (firmware manager daemon) to use the CLI.

Firmwware Manager Daemon (`fwmgrd`)

This fwmgrd daemon is installed and enabled by default in SGIMC 1.3 on SGI Altix ICE systems, only. This daemon provides the services needed for the SGI Management Center graphical user interface to communicate with the firmware management infrastructure. This daemon needs to be running in order to access firmware management from the graphical user interface.

Even if you intend to only use the CLI, it is recommended that the fwmgrd daemon be left running and available.

By default, the fwmgrd log file is located at:

/var/log/fwmgrd.log

View this log for important messages during flashing operations from the SGI Management Center graphical interface.

Notes specific to Management Center 1.3

The first release of the Firmware Management framework only supports SGI Altix ICE firmware, released as RPMs. This includes:sgi-ice-blade-bios, sgi-ice-blade-ib, sgi-ice-blade-zoar, sgi-ice-cmc, and sgi-ice-ib-switch. This includes the Altix ICE compute nodes but does not yet include other managed node types.

SGI intends to expand this firmware management framework to support additional node types in Altix ICE and SGI Rackable cluster hardware in later releases.

Note: SGI Altix ICE integrated InfiniBand switches are supported but only on SGI Altix ICE 8400 series systems or later. Some integrated InfiniBand switch parts in the SGI Altix ICE 8200 series systems will not flash properly with this framework.

Prev	Table of Contents	Next
Chapter 2. System Discovery, Installation, and Configuration		Chapter 4. System Fabric Management