Chapter 4. System Fabric Management

The InfiniBand network on SGI Altix ICE systems uses Open Fabrics Enterprise Distribution (OFED) software. This section describes the InfiniBand fabric and how to manage it. For background information on OFED, see http://www.openfabrics.org.

InfiniBand Fabric Management

This section describes the InfiniBand fabric and covers the following topics:

InfiniBand Fabric Overview

InfiniBand fabric management on SGI Altix ICE systems is done using the OFED OpenSM software package and the sgifmcli tool (see “Fabric Component sgifmcli Command”). The InfiniBand fabric connects the service nodes, rack leader controllers (leader nodes), and the compute nodes. It does not connect to the system admin controller (admin node) or the chassis management control (CMC) blades. SGI Altix ICE systems usually have two separate InfiniBand fabrics, which are generally referred to as "ib0" and "ib1" within this manual, see “InfiniBand Fabric” in Chapter 1.


Note: The LX series only has one ib fabric, "ib0". Any references to "ib1" in this manual do not apply to LX systems.

On SGI Altix ICE 8200 systems, each InfiniBand fabric (also sometimes called an InfiniBand subnet) has its own subnet manager (SM), which runs on a rack leader controller (leader node). For a system with two or more racks, the SM for each fabric is usually configured to run on different leader nodes. In a single rack system, both SMs will run on the single leader node. Each SM may also be paired with a standby SM which can take over in the event of the failure of the primary SM. For more information, see “InfiniBand Fabric Failover Mechanism”.

On SGI Altix ICE 8400 series systems, rack leader controllers (leader nodes) will not always have InfiniBand fabric host channel adapters (HCA) depending on the system configuration. In some cases, one to two RLCs will have HCAs to run the OFED subnet manager. In other cases, this will be done on separate fabric management nodes, in this case no RLCs will have InfiniBand HCAs.

Rack leader controllers associate a SM instance with a particular port on the leader node. Usually, ib0 is mapped to port 1 of the InfiniBand host channel adapter (HCA) on the SM node, and ib1 is mapped to port 2 of the HCA on the SM node (see Figure 1-10). SM for ib0 and ib1 is configured using the corresponding /etc/ofa/opensm-ib[01].conf file.


Note: After a system reboot, the opensm daemons start running automatically.


SGI supports the following topologies: hypercube, enhanced hypercube, and fat tree.

InfiniBand Management Tool Graphical User Interface

You can use the InfiniBand management tool graphical user interface (GUI) to configure, administer, or verify the InfiniBand fabric on your SGI Altix ICE system. You can use it to configure, start, stop, restart, cleanup, or get status for the InfiniBand fabric.

From the system admin controller (admin node), enter the following command:

admin:~ # smc-configure-fabric

The InfiniBand Management Tool GUI appears, as shown in Figure 4-1.

You can also access this command from the configure-cluster GUI main menu Configure Infiniband Fabric option (see “configure-cluster Command Cluster Configuration Tool” in Chapter 2). For more information, see Figure 4-1.

Figure 4-1. InfiniBand Management Tool Screen

InfiniBand Management Tool
Screen

Use the Select button to select the action you want to perform. A submenu will appear. Use the Quit button to return to the previous screen. Use the InfiniBand Management GUI to manage your InfiniBand fabric. You can use the Help button to get online help for each of the GUI actions.

If the smc-configure-fabric command fails in a configuration or administrative operation, it suggests that you use the sgifmcli(8) command (described in “Fabric Component sgifmcli Command”) to debug the problem. Alternatively, you can use the Reset and Init Fabric Database option from the InfiniBand Management Tool main menu (see Figure 4-1) to start over and completely reconfigure the InfiniBand fabrics.

From the Configure InfiniBand screen, make sure you select the Configure Topolgy option to set the topology as shown in Figure 4-2. For more information, see “Network Topology”.

Figure 4-2. Configure Topology Screen

Configure Topology Screen

Use the the online help available with this tool to guide you through the InfiniBand configuration. After configuring and bringing up the InfiniBand network, select the Administer InfiniBand ib0 option or the Administer InfiniBand ib1 option, the Administer InfiniBand screen appears as shown in Figure 4-3. You can use this screen to start, stop, restart, or refresh a fabric.

Figure 4-3. Administer InfiniBand Tool Screen

Administer InfiniBand
Tool Screen

You can verify the status via the Status option, as shown in Figure 4-4.

Figure 4-4. Administer InfiniBand Status Option

Administer InfiniBand Status Option

The Refresh Enhanced Hypercube Config and Restart option applies only to the Enhanced Hypercube topology. You are required to refresh the fabric configuration when you either add, remove, or move one or more compute blades or service nodes. The refresh action updates the guid routing order file which is used to balance InfiniBand traffic for the Enhanced Hypercube topology. In addition, this action also automatically restarts the master subnet manaager (SM) and the optional standby SM for the specified fabric (see “InfiniBand Fabric Failover Mechanism”).

Ideally, the refresh action for a fabric should be taken when there are no jobs running in the system. Restarting the subnet manager can have an adverse impact on the running jobs in the system.

Fabric Component sgifmcli Command

For the most common fabric management operations, the smc-configure-fabric command (described in “InfiniBand Management Tool Graphical User Interface”) is entirely sufficient, and recommended. The sgifmcli(8) command can be used for more advanced fabric management tasks.

The most common operations that sgifmcli would be used for are, as follows:

  • Initializing and configuring external InfiniBand switches

  • Verifying the integrity of the InfiniBand fabric(s)

For more information, see the sgifmcli(8) man page.

Currently, the following switches are supported:

Switch Type 

Description

voltaire-isr-9024 

Voltaire ISR 9024

voltaire-isr-2004 

Voltaire ISR 2004

voltaire-isr-2012 

Voltaire ISR 2012

voltaire-isr-9096 

Voltaire ISR 9096

voltaire-isr-9288 

Voltaire ISR 9288

voltaire4036 

Voltaire Grid Director 4036

mellanox5030 

Mellanox IS5030

To configure an external InfiniBand switch, cluster-wide InfiniBand connectivity is not required. The only necessity is that the supplied switch host name is resolvable and a working networking connection to the external InfiniBand switch exists. See the sgifmcli(8) man page for more information about adding external InfiniBand switches to your cluster's fabric.

Verify the integrity of an InfiniBand fabric requires that the InfiniBand network is first configured properly. This is most easily done using smc-configure-fabric (see “InfiniBand Management Tool Graphical User Interface”). See the sgifmcli(8) man page for details about the fabric verification operation.

sgifmcli SGI Fabric Component Command

The sgifmcli(8) command is, as follows:

sgifmcli [type action [options]] | [options]


Note: You can use shortened versions of the following sgifmcli options as long as the option is unambiguous. For example, sgifmcli --vers for sgifmcli --version.


It accepts the following general options:

General Option 

Description

-h, --help 

Displays a help message and the exits

-V, --version 

Shows the version number of the program

-v, --verbose [DEBUG | INFO | ERROR] 

Select verbosity level (default: ERROR). Most the messages from sgmifmcli are written to a log file named /var/log/sgifmcli.log. The default level reports error messages only. INFO provides the user with details about the operation of sgifmcli in addition to error messages. The DEBUG level produces output that is tailored toward the developer to help with bug fixing. In addition, the DEBUG level also produces INFO and ERROR messages.

It accepts the following detailed options:

Detailed Option 

Description

type 

The type option is one of the following:

  • --mastersm - Master subnet manager

  • --standby - Standby subnet manager

  • --ibswitch - InfiniBand switch

  • --ibfabric - InfiniBand fabric

action 

The action option is one of the following:

  • --init - Initializes the switch or fabric

  • --start - Starts a subnet manager

  • --stop - Stops a subnet manager

  • --status - Prints the status of a subnet manager

  • --verify - Verifies the fabric

  • --refresh - Update a InfiniBand fabric (for Enhanced Hypercube)

  • --set - Sets specific SM configuration parameter (see arglist)

  • --add - Adds a subcomponent to its container, for example, add a switch to a fabric

  • --delete - Deletes a subcomponent from its container, for example, delete a switch from a fabric Removes the switch or fabric

  • --remove - Removes an entity

  • --showconfig - Prints fabric configuration

  • --switchlist - Lists switches in a fabric

  • --create-node-name-map - Creates a node name map for internal SGI Alitx ICE switches

options 

The options option is one or more of the following with no duplicates, for example, the --fabric option must be either ib0 or ib1, not both:

  • --id - Unique identifier, for example, host name

  • --hostname - Name of the node on which to run OpenSM

  • --switchtype - Type of switch (leaf or spine)

  • --model - Switch model ( voltaire-isr-9024, voltaire-isr-2004, voltaire-isr-2012, voltaire-isr-9096, or voltaire-isr-9288)

  • --fabric - Fabric, either ib0 or ib1

  • --topology - InfiniBand topology, either hypercube, enhanced-hypercube, or ftree

  • --arglist - List of Subnet Manager configuration parameters: param_1=val_1, param_2=val_2, ...

EXIT CODES

To facilitate the use of the sgifmcli(8) command in shell scripts, an exit code is returned to give an indication of what occurred during a given connection.

The exit codes returned by sgifmcli are, as follows:

0 

Successful termination.

255 

Abnormal termination.

For a detailed man page, perform the following command from the admin node:

admin:~ # man sgifmcli

The sgifmcli(8) fabric administration utilities man page appears.

sgifmdb Fabric Management Database Command

The fabric component maintains a database (DB) of the objects it manages (managed objects). The database version is automatically set during cluster install. You do not need to set it. Most likely, this database will change over time. To manage multiple database versions and also to aid in field support, SGI has added another command line tool that currently reports the managed objects database version.

The sgifmdb command is, as follows:

sgifmdb [--get|-g] [--dump|-d] [-v|--version] [-r|--reset] [--help|-h]

It accepts the following general options:

General Option 

Description

-g, --get 

Reads the database version object from the database

-d, --dump 

Dumps the database. This option allows the you to see what fabric objects are currently stored in the fabric database.

-v, --version 

Prints version

-r, --reset 

Resets the database and starts clean

-h, --help 

-h, --help

Example 4-1. Getting sgifmdb(8) Command Help

For a sgifmdb command usage statement, perform the following from the admin node:

admin:~ # sgifmdb -h
SGI Fabric Component DB tool
Usage: db_version [--get|-g] [--dump|-d] [-v|--version] [-r|--reset] [--help|-h]

        -g, --get       Read DB version object from DB
        -d, --dump      Dump the DB
        -v, --version   Print version
        -r, --reset     Reset the database and start clean
        -h, --help      Show this text


InfiniBand Fabric Management Configuration and Operation Overview

Each subnet manager (SM) performs a light sweep of the fabric it is managing, every 10 seconds by default. The time interval is set by setting the sweep_interval variable in the /opt/sgi/var/sgifmcli/opensm-ib0.conf.templ file and then doing a Commit operation in the smc-configure-fabric GUI. Alternately, the sgifmcli command has a --arglist option to set various subnet manager configuration parameters including the sweep interval.


Note: If your cluster is larger than 256 nodes, SGI highly recommends increasing this variable to 90 seconds or even larger value.


If an SM detects a change in the fabric during a light sweep, such as, the addition or deletion of a node, it performs a heavy sweep. The heavy sweep actually changes the fabric configuration to reflect the current state of the system. For more information, see the opensm(8) man page on the leader node.

The opensm-ibx.conf configuration files are located in the /opt/sgi/var/sgifmcli directory on the admin node.

Each opensm instance (one for each fabric) associates itself with a particular globally unique identifier (GUID) for a port on the node where opensm runs (see Figure 4-5). This association is configured with the "guid" entry in the corresponding opensm-ib[01].conf file.

Figure 4-5. Two InfiniBand Fabrics in a System with Two IRUs

Two InfiniBand Fabrics in a System with Two IRUs

Network Topology

For SGI Altix ICE systems with a hypercube topology, SGI uses the dimension order routing (DOR) algorithm.

The dimension order routing algorithm is based on the min hop algorithm and so uses shortest paths. Instead of spreading traffic out across different paths with the same shortest distance, it chooses among the available shortest paths based on an ordering of dimensions.

For SGI Altix ICE systems with a fat-tree topology, SGI uses updn as the default routing algorithm. Unicast routing algorithm (UPDN) is also based on the minimum hops to each node, but it is constrained to ranking rules.

For more information on routing variables, see the opensm (8) man page.

Hypercube network topology is well suited for smaller node count MPI jobs or jobs that have communication patterns that are not sensitive to bisection bandwidth. Fat-tree network topology is well suited for large node count MPI jobs that are sensitive to bi-section bandwidth.

As stated above, there are two opensm daemons, one for each fabric, opensmd-ib0 and opensmd-ib1 , respectively. They are controlled by the init.d scripts. Each init.d script has a separate configuration file for each fabric, opensm-ib0 and opensm-ib1 , respectively.

You can use the sminfo command to show the GUID of the SM master.

Configuring the InfiniBand Fabric

This section describes how to configure and administer the InfiniBand fabric using the sgifmcli(8) command.


Note: SGI highly recommends that you use the smc-configure-fabric GUI to configure and administer the fabric (see “InfiniBand Management Tool Graphical User Interface”).


Procedure 4-1. Configure the Master Subnet Manager

    When configuring the SM master, the following rules apply:

    • Each InfiniBand fabric needs to have a subnet manager (SM) master.

    • There can be at most one SM master per InfiniBand fabric.

    • Fabric configuration and administration can only be done via the SM master.

    • Fabric configuration becomes active after (re)starting the SM master.

    • Deleting an SM master automatically deletes its standby, if it exists.

    The syntax to configure an SM master is, as follows:

    sgifmcli --mastersm --init --id identifier --hostname hostname --fabric fabric --topology topology

    This command creates a master with the name provided by the --id option. The identifier can be any arbitrary string. The hostname determines the host on which the SM master manager is launched. The fabric option associates the SM master manager with either ib0 or ib1. The topology option refers to the InfiniBand topology, which can be either hypercube, enhanced hypercube, or fat tree.

    To configure a master for the fabric ib0 on a hypercube cluster, perform the following steps:

    1. From the admin node to configure an SM master, perform the following:

      # sgifmcli --mastersm --init --id master_ib0 --hostname r1lead --fabric ib0 --topology hypercube

      This creates an SM master for ib0. The underlying topology is a hypercube and thus the routing algorithm dor will be used. This SM master, named master_ib0, is configured to run on the host r1lead.

    2. The syntax to start an SM master is, as follows:

      sgifmcli --start --id identifier

      To start the master_ib0 SM master, perform the following:

      # sgifmcli --start --id master_ib0

      At this point a master for the fabric ib0 is running on the r1lead and thus the fabric ib0 is available for compute jobs. If a standby has been defined, it will be launched automatically, in addition, to the master.

    3. The syntax to stop an SM master is, as follows:

      sgifmcli --stop --id identifier

      To stop the master_ib0 SM master, perform the following:
      # sgifmcli --stop --id master_ib0

      The SM master master_ib0 running on host r1lead is stopped. If a standby has been defined then it will be stopped automatically, in addition to the master.

    4. The syntax to check the status of an SM master is, as follows:

      sgifmcli --status --id identifier

      To check the status of the master_ib0 SM master, perform the following:
      # sgifmcli --status --id master_ib0
      Master SM
      Host = rlead
      Guid = 0x0002c902002838f5
      Fabric = ib0
      Topology = hypercube
      Routing Engine = dor
      OpenSM = running

      The status of the master SM master master_ib0 running on host r1lead is reported. If a standby has been defined, its status will be reported in addition to the master.

    5. The syntax to remove an SM master is, as follows:

      sgifmcli --remove --id identifier

      To remove the master_ib0 SM master, first stop it and then perform the -remove option, as follows:

      # sgifmcli --stop --id master_ib0
      
      # sgifmcli --remove --id master_ib0

      The SM master is removed from the entity list. If a standby has been defined, it is removed, in addition to the master.

    6. To find the ID of the master SM in the database, perform the following:

      # sgifmcli --dump --id ib0 | grep MASTER

    7. To print the fabric configuration, run the following:

      # sgifmcli --showconfig
      
      --------------
      NAME = ib1
      TYPE = ibfabric
      MASTER = 
      STANDBY = 
      SWITCH_LIST = 
      --------------
      NAME = ib0
      TYPE = ibfabric
      MASTER = 
      STANDBY = 
      SWITCH_LIST = 

    InfiniBand Fabric Failover Mechanism

    Each subnet manager (SM) has a failover mechanism. If the master SM fails, the standby SM takes over operation of the fabric. This failover operation is performed automatically by the opensm software.Typically, rack1 is the MASTER for the ib0 fabric and rack2 has the MASTER for the ib1 fabric, as shown in Figure 4-6.

    Figure 4-6. opensm Software Failover

    opensm Software Failover

    The following procedure describes how to setup the failover mechanism.

    Procedure 4-2. Enabling the InfiniBand Failover Mechanism

      When enabling the InfiniBand failover mechanism, the following rules apply:

      • Each InfiniBand fabric can optionally have exactly one standby.

      • A standby SM can only be created for a particular fabric when a master already exists.

      • When adding a standby after a master has already been defined and started, the master needs to be stopped before the standby is defined via the --init option. After defining the standby via --init, restart the master.

      • A SM master and SM standby for a particular fabric can not coexist on the same node.

      SGI highly recommends that you use the smc-configure-fabric GUI to configure the failover mechanism. If it is necessary to use sgifmcli(8) to enable the InfiniBand failover mechanism, perform the following steps:

      1. If an SM master is defined and running, stop it, as follows:

        # sgifmcli --stop --id master_ib0

        If the SM master has not been defined, define it, as follows:
        # sgifmcli --mastersm --init --id master_ib0 --hostname r1lead --fabric ib0 --topology hypercube

      2. Define the SM standby, as follows:

        # sgifmcli --standbysm --init --id standby_ib0 --hostname r2lead --fabric ib0

      3. Start the SM master, as follows:

        # sgifmcli --start --id master_ib0

        This automatically starts the SM master and the SM standby for ib0.

      4. Now check the status for the subnet manager of ib0, as follows:

        sgifmcli --status --id master_ib0
        
        Master SM
        Host = r1lead
        Guid = 0x0008f10403987da9
        Fabric = ib0
        Toplogy = hypercube
        Routing Engine = dor
        OpenSM = running
        Standby SM
        Host = r2lead
        Guid = 0x0008f10403987d25
        Fabric = ib0
        OpenSM = running

      5. To remove the standby_ib0 SM standby, first stop its master and then perform the remove option, as follows:

        # sgifmcli --stop --id master_ib0
        # sgifmcli --remove --id standby_ib0

        The SM standby is removed from the entity list. If a standby has been defined, it is removed, in addition to the master.

      Configuring the InfiniBand Fat-tree Network Topology

      This section describes how to configure InfiniBand fat-tree network topology. The fat-tree topology involves external InfiniBand switches. For the list of supported external switches, see “Fabric Component sgifmcli Command”.

      InfiniBand switches are generally classified as being of two types: edge switches and core or spine switches. Edge switches are used to connect to compute nodes. Core or spine switches are used to connect edge switches together. The integrated InfiniBand switches in SGI Altix ICE systems are considered to be edge switches and external InfiniBand switches used to connect these edge switches together in a fat-tree topology are considered to be spine switches.

      The sgifmcli command allows two types of fat-tree topologies to be configured: FTREE and BFTREE. BFTREE is a balanced fat-tree. If the fat-tree topology is not balanced choose FTREE, otherwise; choose BFTREE for a balanced fat-tree.

      SGI recommends that you use the SMC for Altix ICE discover command (see “discover Command” in Chapter 2) to discover external IB switches. After discovery is completed, an external switch can also be initialized and added to the InfiniBand system using the sgifmcli command.

      The --init and --add options below are completed by the SMC for Altix ICE discover command when the external switch is discovered with the --switch option. If the external switch is discovered not to be an external switch but as a general node, then the --init and --add options below, need to done.

      Procedure 4-3. Configuring InfiniBand Fat-tree Network Topology

        To configure the InfiniBand fat-tree network topology on an SGI Altix ICE system, perform the following steps:

        1. Make sure that your switch is properly connected to the InfiniBand network. Also, make sure that the admin port of the switch is properly connected to the Ethernet network.

        2. Power on the switch. See the switch manual for operation information.

        3. From the admin node, initialize the switch. The syntax to initialize the switch is, as follows:

          sgifmcli --init --ibswitch --model   --id  --switchtype [leaf | spine]

          An example command is, as follows:

          # sgifmcli --init --ibswitch --model voltaire-isr-2004  --id isr2004 --switchtype spine

          This configures a Voltaire switch ISR2004 with hostname isr2004 as a spine switch. isr2004 refers to the admin port of the switch and needs to be configured previously to allow for switch access. The switch is now initialized and the root GUID from the spine switches have been downloaded.

        4. From the admin node, add the switch to the fabric. The syntax to add the switch is, as follows:

          sgifmcli --add --id <fabric> --switch <hostname>

          An example command is, as follows:

          # sgifmcli --add --id ib0 --switch isr2004

          In this example, ISR2004 is connected to the ib0 fabric.

        5. For the new switch to be activated, the SM master and the optional SM standby need to be (re)started.

          # sgifmcli --start --id master_ib0

          If the SM master was running while the switch was added, you first need to stop and then start the master, as follows:

          # sgifmcli --stop --id master_ib0
          # sgifmcli --start --id master_ib0

          If a standby has been defined, then in case of an SM master failure the SM standby subnet manager will automatically take over and assume control over the switch.

        6. The switches related to a particular fabric can be listed, as follows:

          # sgifmcli --switchlist --id <fabric>

        Configuring the Lightweight Fabric

        This section describes how to configure the lighweight fabric with fat-tree topology using external Mellanox switches.

        Procedure 4-4. Configuring the Lightweight Fabric

          To configure the Lightweight Fabric, perform the following steps:

          1. The switch should be setup to use dynamic host configuration protocol (DHCP), as part of the initial setup. This is done by SGI in the factory. You only need to go through the process if a new switch is being installed. For configuration information, see the Mellanox Technologies IS5025/5030/5031/5035 Installation Guide. See the section called "Configuring the switch for the First Time". When asked about using DHCP answer "Yes". For IP configuration information, see Table 4 - “Configuration Wizard Session - IP Configuration by DHCP”.

          2. Use the discover command, to discover external switches. See “discover Command” in Chapter 2. The switch model to be used is "mellanox5030". The discover command supports external switches in a manner similar to racks and service nodes, except that switches do not have BMCs and there is no software to install.

          3. Discover all external switches.

          4. Use smc-configure-fabric to configure the fabric, as described in “InfiniBand Management Tool Graphical User Interface”.

            In the Configure Topology option, use BFTREE as the topology. The FAT TREE topology option should not be used. Proceed with the steps, described in “InfiniBand Management Tool Graphical User Interface”, to configure and verify the fabric.

          Verifying the InfiniBand Network

          After your InfiniBand fabric has been configured and started, you can use the sgifmcli(8) command to verify the health of the fabric.

          Procedure 4-5. Verifying the InfiniBand Network

            The fabric can be either ib0 or ib1 . This version of the InfiniBand verifier runs the recommended OFED test suite. In addition, the SMC for Altix ICE cluster view is compared with the InfiniBand cluster view and potential differences are reported.

            To verify the ibo fabric, perform the following command:

            # sgifmcli --verify --id <fabric>

            For more information, see the sgifmcli(8) man page.

            Useful Utilities and Diagnostics

            The infiniband-diags-pp package contains useful tools and diagnostic software for Open Fabrics Enterprise Distribution (OFED). This section describes some of these tools. These tools reside on the rack leader controller (leader node) in the /usr/sbin directory. To see a full list of diagnostics, from the leader node, use the following command:

            # rpm -ql infiniband-diags-pp | grep "/usr/sbin"

            This section covers the following topics:

            ibstat and ibstatus Commands

            You can use the ibstat command to see the current status of the host channel adapters (HCA) in your InfiniBand fabric including the HCAs on rack leader controllers. The following view is prior to starting the fabric management:

            r1lead:/usr/bin # ibstat
            CA 'mthca0'
                    CA type: MT25208 (MT23108 compat mode)
                    Number of ports: 2
                    Firmware version: 4.7.600
                    Hardware version: a0
                    Node GUID: 0x0008f104039881a8
                    System image GUID: 0x0008f104039881ab
                    Port 1:
                            State: Initializing
                            Physical state: LinkUp
                            Rate: 20
                            Base lid: 0
                            LMC: 0
                            SM lid: 0
                            Capability mask: 0x02510a68
                            Port GUID: 0x0008f104039881a9
                    Port 2:
                            State: Initializing
                            Physical state: LinkUp
                            Rate: 20
                            Base lid: 0
                            LMC: 0
                            SM lid: 0
                            Capability mask: 0x02510a68
                            Port GUID: 0x0008f104039881aa

            The following shows output from the ibstat command after the fabric management software has been started:

            r1lead:/opt/sgi/sbin # ibstat
            CA 'mthca0'
                    CA type: MT25208 (MT23108 compat mode)
                    Number of ports: 2
                    Firmware version: 4.7.600
                    Hardware version: a0
                    Node GUID: 0x0008f104039881a8
                    System image GUID: 0x0008f104039881ab
                    Port 1:
                            State: Active
                            Physical state: LinkUp
                            Rate: 20
                            Base lid: 1
                            LMC: 0
                            SM lid: 1
                            Capability mask: 0x02510a6a
                            Port GUID: 0x0008f104039881a9
                    Port 2:
                            State: Active
                            Physical state: LinkUp
                            Rate: 20
                            Base lid: 1
                            LMC: 0
                            SM lid: 1
                            Capability mask: 0x02510a6a
                            Port GUID: 0x0008f104039881aa

            You can use the ibstatus (less verbose that ibstat) command to show the link rate, as follows:

            r1lead:/opt/sgi/sbin # ibstatus
            Infiniband device 'mthca0' port 1 status:
                    default gid:     fe80:0000:0000:0000:0008:f104:0398:81a9
                    base lid:        0x1
                    sm lid:          0x1
                    state:           4: ACTIVE
                    phys state:      5: LinkUp
                    rate:            20 Gb/sec (4X DDR)
            
            Infiniband device 'mthca0' port 2 status:
                    default gid:     fe80:0000:0000:0000:0008:f104:0398:81aa
                    base lid:        0x1
                    sm lid:          0x1
                    state:           4: ACTIVE
                    phys state:      5: LinkUp
                    rate:            20 Gb/sec (4X DDR)


            Note: If link rate is not 20 Gb/sec 4xDDR, and you have a DDR capable HCA, there is a physical link problem with your system.


            perfquery Command

            The perfquery command is useful for finding errors on a particular HCA (or a number of them) and switch ports. You can also use perfquery to reset HCA and switch port counters.

            To see a usage statement for the perfquery command, perform the following:

            r1lead:/opt/sgi/sbin # perfquery --help
            Usage: perfquery [-d(ebug) -G(uid) -a(ll_ports) -r(eset_after_read) -C ca_name -P ca_port -R(eset_only)
             -t(imeout) timeout_ms -V(ersion) -h(elp)] [<lid|guid> [[port] [reset_mask]]]
                    Examples:
                            perfquery               # read local port's performance counters
                            perfquery 32 1          # read performance counters from lid 32, port 1
                            perfquery -e 32 1       # read extended performance counters from lid 32, port 1
                            perfquery -a 32         # read performance counters from lid 32, all ports
                            perfquery -r 32 1       # read performance counters and reset
                            perfquery -e -r 32 1    # read extended performance counters and reset
                            perfquery -R 0x20 1     # reset performance counters of port 1 only
                            perfquery -e -R 0x20 1  # reset extended performance counters of port 1 only
                            perfquery -R -a 32      # reset performance counters of all ports
                            perfquery -R 32 2 0x0fff        # reset only error counters of port 2
                            perfquery -R 32 2 0xf000        # reset only non-error counters of port 2

            Some sample output from the perfquery command is, as follows:
            r1lead:/opt/sgi/sbin # perfquery
            # Port counters: Lid 1 port 1
            PortSelect:......................1
            CounterSelect:...................0x0000
            SymbolErrors:....................0
            LinkRecovers:....................0
            LinkDowned:......................0
            RcvErrors:.......................0
            RcvRemotePhysErrors:.............0
            RcvSwRelayErrors:................0
            XmtDiscards:.....................0
            XmtConstraintErrors:.............0
            RcvConstraintErrors:.............0
            LinkIntegrityErrors:.............0
            ExcBufOverrunErrors:.............0
            VL15Dropped:.....................0
            XmtData:.........................0
            RcvData:.........................0
            XmtPkts:.........................0
            RcvPkts:.........................0

            ibnetdiscover Command

            The ibnetdiscover command allows you discover the IB fabric.

            To see a usage statement for the ibnetdiscover command, perform the following:

            r1lead:/opt/sgi/sbin # ibnetdiscover --help
            Usage: ibnetdiscover [-d(ebug)] -e(rr_show) -v(erbose) -s(how) -l(ist) 
            -g(rouping) -H(ca_list) -S(witch_list) 
            -V(ersion) -C ca_name -P ca_port -t(imeout) timeout_ms 
            --switch-map switch-map] [<topology-file>]
            --switch-map <switch-map>  specify a switch-map file


            Note: Only abbreviated output is shown in the this example.


            Some sample output from the ibnetdiscover command is, as follows:
            r1lead:/opt/sgi/sbin # ibnetdiscover
            #
            # Topology file: generated on Tue Jul 17 14:05:20 2007
            #
            # Max of 3 hops discovered
            # Initiated from node 0008f104039881a8 port 0008f104039881a9
            
            vendid=0x2c9
            devid=0xb924
            sysimgguid=0x8006900000000dd
            
            ...
            
            Switch   : 0x08006900000000dc ports 24 devid 0xb924 vendid 0x2c9 
            "MT47396 Infiniscale-III Mellanox Technologies"
            Switch   : 0x08006900000000a4 ports 24 devid 0xb924 vendid 0x2c9 
            "MT47396 Infiniscale-III Mellanox Technologies"
            
            r1lead:/opt/sgi/sbin # ibnetdiscover -H (HCA's)
            Ca       : 0x0030487aa7940000 ports 1 devid 0x6274 vendid 0x2c9 "MT25204 InfiniHostLx Mellanox Technologies"
            Ca       : 0x0030487aa78c0000 ports 1 devid 0x6274 vendid 0x2c9 "r1i0n8-ib0 HCA-1"
            Ca       : 0x0008f10403988198 ports 2 devid 0x6278 vendid 0x8f1 " HCA-1"
            Ca       : 0x0030487aa7840000 ports 1 devid 0x6274 vendid 0x2c9 "r1i0n1-ib0 HCA-1"
            Ca       : 0x0030487aa79c0000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n0-ib0 HCA-1"
            Ca       : 0x0030487aa7900000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n8-ib0 HCA-1"
            Ca       : 0x0030487aa7980000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n1-ib0 HCA-1"
            Ca       : 0x0008f104039881a8 ports 2 devid 0x6278 vendid 0x8f1 " HCA-1"
            
            ======================================================================================================

            ibdiagnet Command

            The ibdiagnet command is a useful diagnostic tool.

            To see a usage statement for the ibdiagnet command, perform the following:

            r1lead:/opt/sgi/sbin # ibdiagnet --help
            Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
            NAME
              ibdiagnet
            SYNOPSYS
              ibdiagnet [-c ] [-v] [-r] [-o ]
                 [-t ] [-s ] [-i ] [-p ]
                 [-pm] [-pc] [-P <>]
                 [-lw <1x|4x|12x>] [-ls <2.5|5|10>]
                
            
            DESCRIPTION
              ibdiagnet scans the fabric using directed route packets and extracts all the 
              available information regarding its connectivity and devices.
              It then produces the following files in the output directory defined by the
              -o option (see below): 
                ibdiagnet.lst    - List of all the nodes, ports and links in the fabric
                ibdiagnet.fdbs   - A dump of the unicast forwarding tables of the fabric
                                   switches
                ibdiagnet.mcfdbs - A dump of the multicast forwarding tables of the fabric
                                   switches
                ibdiagnet.masks  - In case of duplicate port/node Guids, these file include
                                   the map between masked Guid and real Guids 
                ibdiagnet.sm     - A dump of all the SM (state and priority) in the fabric
                ibdiagnet.pm     - In case -pm option was provided, this file contain a dump
                                   of all the nodes PM counters
              In addition to generating the files above, the discovery phase also checks for
              duplicate node/port GUIDs in the IB fabric. If such an error is detected, it 
              is displayed on the standard output.
              After the discovery phase is completed, directed route packets are sent
              multiple times (according to the -c option) to detect possible problematic 
              paths on which packets may be lost. Such paths are explored, and a report of
              the suspected bad links is displayed on the standard output.
              After scanning the fabric, if the -r option is provided, a full report of the
              fabric qualities is displayed.
              This report includes: 
                SM report
                Number of nodes and systems
                Hop-count information: 
                     maximal hop-count, an example path, and a hop-count histogram
                All CA-to-CA paths traced 
                Credit loop report
                mgid-mlid-HCAs matching table
              Note: In case the IB fabric includes only one CA, then CA-to-CA paths are not
              reported.
              Furthermore, if a topology file is provided, ibdiagnet uses the names defined
              in it for the output reports.
                  
            OPTIONS
              -c                      : The minimal number of packets to be sent
                                               across each link (default = 10)
              -v                             : Instructs the tool to run in verbose mode
              -r                             : Provides a report of the fabric qualities
              -o                    : Specifies the directory where the output
                                               files will be placed (default = /tmp)
              -t                  : Specifies the topology file name
              -s                   : Specifies the local system name. Meaningful
                                               only if a topology file is specified
              -i                  : Specifies the index of the device of the port
                                               used to connect to the IB fabric (in case of
                                               multiple devices on the local system)
              -p                   : Specifies the local device's port number used
                                               to connect to the IB fabric
              -pm                            : Dumps all pmCounters values into ibdiagnet.pm
              -pc                            : reset all the fabric links pmCounters
              -P <>: If any of the provided pm is greater then its
                                               provided value, print it to screen
              -lw <1x|4x|12x>                : Specifies the expected link width
              -ls <2.5|5|10>                 : Specifies the expected link speed
                                                 
              -h|--help                      : Prints this help information
              -V|--version                   : Prints the version of the tool
                 --vars                      : Prints the tool's environment variables and
                                               their values
            
            ERROR CODES
              1 - Failed to fully discover the fabric
              2 - Failed to parse command line options
              3 - Failed to interact with IB fabric
              4 - Failed to use local device or local port
              5 - Failed to use Topology File
              6 - Failed to load required Package
            

            Output which shows no errors means the system is operating correctly:

            r1lead:/opt/sgi/sbin # ibdiagnet
            Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
            Loading IBDM from: /usr/lib64/ibdm1.2
            -W- Topology file is not specified.
                Reports regarding cluster links will use direct routes.
            -W- A few ports of local device are up.
                Since port-num was not specified (-p option), port 1 of device 1 will be
                used as the local port.
            -I- Discovering the subnet ... 10 nodes (2 Switches & 8 CA-s) discovered.
            
            
            -I---------------------------------------------------
            -I- Bad Guids Info
            -I---------------------------------------------------
            -I- No bad Guids were found
            
            -I---------------------------------------------------
            -I- Links With Logical State = INIT
            -I---------------------------------------------------
            -I- No bad Links (with logical state = INIT) were found
            
            -I---------------------------------------------------
            -I- PM Counters Info
            -I---------------------------------------------------
            -I- No illegal PM counters values were found
            
            -I---------------------------------------------------
            -I- Bad Links Info
            -I---------------------------------------------------
            -I- No bad link were found
             
            -I- Done. Run time was 0 seconds.
            

            You can use ibdiagnet to load the fabric to test it, as follows:

            r1lead:/opt/sgi/sbin # ibdiagnet -c 5000
            Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
            Loading IBDM from: /usr/lib64/ibdm1.2
            -W- Topology file is not specified.
                Reports regarding cluster links will use direct routes.
            -W- A few ports of local device are up.
                Since port-num was not specified (-p option), port 1 of device 1 will be
                used as the local port.
            -I- Discovering the subnet ... 10 nodes (2 Switches & 8 CA-s) discovered.
            
            
            -I---------------------------------------------------
            -I- Bad Guids Info
            -I---------------------------------------------------
            -I- No bad Guids were found
            
            -I---------------------------------------------------
            -I- Links With Logical State = INIT
            -I---------------------------------------------------
            -I- No bad Links (with logical state = INIT) were found
            
            -I---------------------------------------------------
            -I- PM Counters Info
            -I---------------------------------------------------
            -I- No illegal PM counters values were found
            
            -I---------------------------------------------------
            -I- Bad Links Info
            -I---------------------------------------------------
            -I- No bad link were found
             
            -I- Done. Run time was 8 seconds.