The InfiniBand network on SGI Altix ICE systems uses Open Fabrics Enterprise Distribution (OFED) software. This section describes the InfiniBand fabric and how to manage it. For background information on OFED, see http://www.openfabrics.org.
This section describes the InfiniBand fabric and covers the following topics:
InfiniBand fabric management on SGI Altix ICE systems is done using the OFED OpenSM software package and the sgifmcli tool (see “Fabric Component sgifmcli Command”). The InfiniBand fabric connects the service nodes, rack leader controllers (leader nodes), and the compute nodes. It does not connect to the system admin controller (admin node) or the chassis management control (CMC) blades. SGI Altix ICE systems usually have two separate InfiniBand fabrics, which are generally referred to as "ib0" and "ib1" within this manual, see “InfiniBand Fabric” in Chapter 1.
Note: The LX series only has one ib fabric, "ib0". Any references to "ib1" in this manual do not apply to LX systems. |
On SGI Altix ICE 8200 systems, each InfiniBand fabric (also sometimes called an InfiniBand subnet) has its own subnet manager (SM), which runs on a rack leader controller (leader node). For a system with two or more racks, the SM for each fabric is usually configured to run on different leader nodes. In a single rack system, both SMs will run on the single leader node. Each SM may also be paired with a standby SM which can take over in the event of the failure of the primary SM. For more information, see “InfiniBand Fabric Failover Mechanism”.
On SGI Altix ICE 8400 series systems, rack leader controllers (leader nodes) will not always have InfiniBand fabric host channel adapters (HCA) depending on the system configuration. In some cases, one to two RLCs will have HCAs to run the OFED subnet manager. In other cases, this will be done on separate fabric management nodes, in this case no RLCs will have InfiniBand HCAs.
Rack leader controllers associate a SM instance with a particular port on the leader node. Usually, ib0 is mapped to port 1 of the InfiniBand host channel adapter (HCA) on the SM node, and ib1 is mapped to port 2 of the HCA on the SM node (see Figure 1-10). SM for ib0 and ib1 is configured using the corresponding /etc/ofa/opensm-ib[01].conf file.
Note: After a system reboot, the opensm daemons start running automatically. |
SGI supports the following topologies: hypercube, enhanced hypercube, and fat tree.
You can use the InfiniBand management tool graphical user interface (GUI) to configure, administer, or verify the InfiniBand fabric on your SGI Altix ICE system. You can use it to configure, start, stop, restart, cleanup, or get status for the InfiniBand fabric.
From the system admin controller (admin node), enter the following command:
admin:~ # smc-configure-fabric |
You can also access this command from the configure-cluster GUI main menu Configure Infiniband Fabric option (see “configure-cluster Command Cluster Configuration Tool” in Chapter 2). For more information, see Figure 4-1.
If the smc-configure-fabric command fails in a configuration or administrative operation, it suggests that you use the sgifmcli(8) command (described in “Fabric Component sgifmcli Command”) to debug the problem. Alternatively, you can use the Reset and Init Fabric Database option from the InfiniBand Management Tool main menu (see Figure 4-1) to start over and completely reconfigure the InfiniBand fabrics.
From the Configure InfiniBand screen, make sure you select the Configure Topolgy option to set the topology as shown in Figure 4-2. For more information, see “Network Topology”.
Use the the online help available with this tool to guide you through the InfiniBand configuration. After configuring and bringing up the InfiniBand network, select the Administer InfiniBand ib0 option or the Administer InfiniBand ib1 option, the Administer InfiniBand screen appears as shown in Figure 4-3. You can use this screen to start, stop, restart, or refresh a fabric.
The Refresh Enhanced Hypercube Config and Restart option applies only to the Enhanced Hypercube topology. You are required to refresh the fabric configuration when you either add, remove, or move one or more compute blades or service nodes. The refresh action updates the guid routing order file which is used to balance InfiniBand traffic for the Enhanced Hypercube topology. In addition, this action also automatically restarts the master subnet manaager (SM) and the optional standby SM for the specified fabric (see “InfiniBand Fabric Failover Mechanism”).
Ideally, the refresh action for a fabric should be taken when there are no jobs running in the system. Restarting the subnet manager can have an adverse impact on the running jobs in the system.
For the most common fabric management operations, the smc-configure-fabric command (described in “InfiniBand Management Tool Graphical User Interface”) is entirely sufficient, and recommended. The sgifmcli(8) command can be used for more advanced fabric management tasks.
The most common operations that sgifmcli would be used for are, as follows:
Initializing and configuring external InfiniBand switches
Verifying the integrity of the InfiniBand fabric(s)
For more information, see the sgifmcli(8) man page.
Currently, the following switches are supported:
Switch Type | Description | |
voltaire-isr-9024 | Voltaire ISR 9024 | |
voltaire-isr-2004 | Voltaire ISR 2004 | |
voltaire-isr-2012 | Voltaire ISR 2012 | |
voltaire-isr-9096 | Voltaire ISR 9096 | |
voltaire-isr-9288 | Voltaire ISR 9288 | |
voltaire4036 | Voltaire Grid Director 4036 | |
mellanox5030 | Mellanox IS5030 |
To configure an external InfiniBand switch, cluster-wide InfiniBand connectivity is not required. The only necessity is that the supplied switch host name is resolvable and a working networking connection to the external InfiniBand switch exists. See the sgifmcli(8) man page for more information about adding external InfiniBand switches to your cluster's fabric.
Verify the integrity of an InfiniBand fabric requires that the InfiniBand network is first configured properly. This is most easily done using smc-configure-fabric (see “InfiniBand Management Tool Graphical User Interface”). See the sgifmcli(8) man page for details about the fabric verification operation.
The sgifmcli(8) command is, as follows:
sgifmcli [type action [options]] | [options] |
Note: You can use shortened versions of the following sgifmcli options as long as the option is unambiguous. For example, sgifmcli --vers for sgifmcli --version. |
It accepts the following general options:
General Option | Description | |
-h, --help | Displays a help message and the exits | |
-V, --version | Shows the version number of the program | |
-v, --verbose [DEBUG | INFO | ERROR] | Select verbosity level (default: ERROR). Most the messages from sgmifmcli are written to a log file named /var/log/sgifmcli.log. The default level reports error messages only. INFO provides the user with details about the operation of sgifmcli in addition to error messages. The DEBUG level produces output that is tailored toward the developer to help with bug fixing. In addition, the DEBUG level also produces INFO and ERROR messages. |
It accepts the following detailed options:
Detailed Option | Description | |
type | The type option is one of the following:
| |
action | The action option is one of the following:
| |
options | The options option is one or more of the following with no duplicates, for example, the --fabric option must be either ib0 or ib1, not both:
|
EXIT CODES
To facilitate the use of the sgifmcli(8) command in shell scripts, an exit code is returned to give an indication of what occurred during a given connection.
The exit codes returned by sgifmcli are, as follows:
0 | Successful termination. | |
255 | Abnormal termination. |
For a detailed man page, perform the following command from the admin node:
admin:~ # man sgifmcli |
The fabric component maintains a database (DB) of the objects it manages (managed objects). The database version is automatically set during cluster install. You do not need to set it. Most likely, this database will change over time. To manage multiple database versions and also to aid in field support, SGI has added another command line tool that currently reports the managed objects database version.
The sgifmdb command is, as follows:
sgifmdb [--get|-g] [--dump|-d] [-v|--version] [-r|--reset] [--help|-h] |
It accepts the following general options:
General Option | Description | |
-g, --get | Reads the database version object from the database | |
-d, --dump | Dumps the database. This option allows the you to see what fabric objects are currently stored in the fabric database. | |
-v, --version | Prints version | |
-r, --reset | Resets the database and starts clean | |
-h, --help | -h, --help |
Example 4-1. Getting sgifmdb(8) Command Help
For a sgifmdb command usage statement, perform the following from the admin node:
admin:~ # sgifmdb -h SGI Fabric Component DB tool Usage: db_version [--get|-g] [--dump|-d] [-v|--version] [-r|--reset] [--help|-h] -g, --get Read DB version object from DB -d, --dump Dump the DB -v, --version Print version -r, --reset Reset the database and start clean -h, --help Show this text |
Each subnet manager (SM) performs a light sweep of the fabric it is managing, every 10 seconds by default. The time interval is set by setting the sweep_interval variable in the /opt/sgi/var/sgifmcli/opensm-ib0.conf.templ file and then doing a Commit operation in the smc-configure-fabric GUI. Alternately, the sgifmcli command has a --arglist option to set various subnet manager configuration parameters including the sweep interval.
Note: If your cluster is larger than 256 nodes, SGI highly recommends increasing this variable to 90 seconds or even larger value. |
If an SM detects a change in the fabric during a light sweep, such as, the addition or deletion of a node, it performs a heavy sweep. The heavy sweep actually changes the fabric configuration to reflect the current state of the system. For more information, see the opensm(8) man page on the leader node.
The opensm-ibx.conf configuration files are located in the /opt/sgi/var/sgifmcli directory on the admin node.
Each opensm instance (one for each fabric) associates itself with a particular globally unique identifier (GUID) for a port on the node where opensm runs (see Figure 4-5). This association is configured with the "guid" entry in the corresponding opensm-ib[01].conf file.
For SGI Altix ICE systems with a hypercube topology, SGI uses the dimension order routing (DOR) algorithm.
The dimension order routing algorithm is based on the min hop algorithm and so uses shortest paths. Instead of spreading traffic out across different paths with the same shortest distance, it chooses among the available shortest paths based on an ordering of dimensions.
For SGI Altix ICE systems with a fat-tree topology, SGI uses updn as the default routing algorithm. Unicast routing algorithm (UPDN) is also based on the minimum hops to each node, but it is constrained to ranking rules.
For more information on routing variables, see the opensm (8) man page.
Hypercube network topology is well suited for smaller node count MPI jobs or jobs that have communication patterns that are not sensitive to bisection bandwidth. Fat-tree network topology is well suited for large node count MPI jobs that are sensitive to bi-section bandwidth.
As stated above, there are two opensm daemons, one for each fabric, opensmd-ib0 and opensmd-ib1 , respectively. They are controlled by the init.d scripts. Each init.d script has a separate configuration file for each fabric, opensm-ib0 and opensm-ib1 , respectively.
You can use the sminfo command to show the GUID of the SM master.
This section describes how to configure and administer the InfiniBand fabric using the sgifmcli(8) command.
Note: SGI highly recommends that you use the smc-configure-fabric GUI to configure and administer the fabric (see “InfiniBand Management Tool Graphical User Interface”). |
When configuring the SM master, the following rules apply:
Each InfiniBand fabric needs to have a subnet manager (SM) master.
There can be at most one SM master per InfiniBand fabric.
Fabric configuration and administration can only be done via the SM master.
Fabric configuration becomes active after (re)starting the SM master.
Deleting an SM master automatically deletes its standby, if it exists.
The syntax to configure an SM master is, as follows:
sgifmcli --mastersm --init --id identifier --hostname hostname --fabric fabric --topology topology |
This command creates a master with the name provided by the --id option. The identifier can be any arbitrary string. The hostname determines the host on which the SM master manager is launched. The fabric option associates the SM master manager with either ib0 or ib1. The topology option refers to the InfiniBand topology, which can be either hypercube, enhanced hypercube, or fat tree.
To configure a master for the fabric ib0 on a hypercube cluster, perform the following steps:
From the admin node to configure an SM master, perform the following:
# sgifmcli --mastersm --init --id master_ib0 --hostname r1lead --fabric ib0 --topology hypercube |
This creates an SM master for ib0. The underlying topology is a hypercube and thus the routing algorithm dor will be used. This SM master, named master_ib0, is configured to run on the host r1lead.
The syntax to start an SM master is, as follows:
sgifmcli --start --id identifier |
To start the master_ib0 SM master, perform the following:
# sgifmcli --start --id master_ib0 |
At this point a master for the fabric ib0 is running on the r1lead and thus the fabric ib0 is available for compute jobs. If a standby has been defined, it will be launched automatically, in addition, to the master.
The syntax to stop an SM master is, as follows:
sgifmcli --stop --id identifier |
# sgifmcli --stop --id master_ib0 |
The SM master master_ib0 running on host r1lead is stopped. If a standby has been defined then it will be stopped automatically, in addition to the master.
The syntax to check the status of an SM master is, as follows:
sgifmcli --status --id identifier |
# sgifmcli --status --id master_ib0 Master SM Host = rlead Guid = 0x0002c902002838f5 Fabric = ib0 Topology = hypercube Routing Engine = dor OpenSM = running |
The syntax to remove an SM master is, as follows:
sgifmcli --remove --id identifier |
To remove the master_ib0 SM master, first stop it and then perform the -remove option, as follows:
# sgifmcli --stop --id master_ib0 # sgifmcli --remove --id master_ib0 |
The SM master is removed from the entity list. If a standby has been defined, it is removed, in addition to the master.
To find the ID of the master SM in the database, perform the following:
# sgifmcli --dump --id ib0 | grep MASTER |
To print the fabric configuration, run the following:
# sgifmcli --showconfig -------------- NAME = ib1 TYPE = ibfabric MASTER = STANDBY = SWITCH_LIST = -------------- NAME = ib0 TYPE = ibfabric MASTER = STANDBY = SWITCH_LIST = |
Each subnet manager (SM) has a failover mechanism. If the master SM fails, the standby SM takes over operation of the fabric. This failover operation is performed automatically by the opensm software.Typically, rack1 is the MASTER for the ib0 fabric and rack2 has the MASTER for the ib1 fabric, as shown in Figure 4-6.
The following procedure describes how to setup the failover mechanism.
When enabling the InfiniBand failover mechanism, the following rules apply:
Each InfiniBand fabric can optionally have exactly one standby.
A standby SM can only be created for a particular fabric when a master already exists.
When adding a standby after a master has already been defined and started, the master needs to be stopped before the standby is defined via the --init option. After defining the standby via --init, restart the master.
A SM master and SM standby for a particular fabric can not coexist on the same node.
SGI highly recommends that you use the smc-configure-fabric GUI to configure the failover mechanism. If it is necessary to use sgifmcli(8) to enable the InfiniBand failover mechanism, perform the following steps:
If an SM master is defined and running, stop it, as follows:
# sgifmcli --stop --id master_ib0 |
# sgifmcli --mastersm --init --id master_ib0 --hostname r1lead --fabric ib0 --topology hypercube |
Define the SM standby, as follows:
# sgifmcli --standbysm --init --id standby_ib0 --hostname r2lead --fabric ib0 |
Start the SM master, as follows:
# sgifmcli --start --id master_ib0 |
This automatically starts the SM master and the SM standby for ib0.
Now check the status for the subnet manager of ib0, as follows:
sgifmcli --status --id master_ib0 Master SM Host = r1lead Guid = 0x0008f10403987da9 Fabric = ib0 Toplogy = hypercube Routing Engine = dor OpenSM = running Standby SM Host = r2lead Guid = 0x0008f10403987d25 Fabric = ib0 OpenSM = running |
To remove the standby_ib0 SM standby, first stop its master and then perform the remove option, as follows:
# sgifmcli --stop --id master_ib0 # sgifmcli --remove --id standby_ib0 |
The SM standby is removed from the entity list. If a standby has been defined, it is removed, in addition to the master.
This section describes how to configure InfiniBand fat-tree network topology. The fat-tree topology involves external InfiniBand switches. For the list of supported external switches, see “Fabric Component sgifmcli Command”.
InfiniBand switches are generally classified as being of two types: edge switches and core or spine switches. Edge switches are used to connect to compute nodes. Core or spine switches are used to connect edge switches together. The integrated InfiniBand switches in SGI Altix ICE systems are considered to be edge switches and external InfiniBand switches used to connect these edge switches together in a fat-tree topology are considered to be spine switches.
The sgifmcli command allows two types of fat-tree topologies to be configured: FTREE and BFTREE. BFTREE is a balanced fat-tree. If the fat-tree topology is not balanced choose FTREE, otherwise; choose BFTREE for a balanced fat-tree.
SGI recommends that you use the SMC for Altix ICE discover command (see “discover Command” in Chapter 2) to discover external IB switches. After discovery is completed, an external switch can also be initialized and added to the InfiniBand system using the sgifmcli command.
The --init and --add options below are completed by the SMC for Altix ICE discover command when the external switch is discovered with the --switch option. If the external switch is discovered not to be an external switch but as a general node, then the --init and --add options below, need to done.
To configure the InfiniBand fat-tree network topology on an SGI Altix ICE system, perform the following steps:
Make sure that your switch is properly connected to the InfiniBand network. Also, make sure that the admin port of the switch is properly connected to the Ethernet network.
Power on the switch. See the switch manual for operation information.
From the admin node, initialize the switch. The syntax to initialize the switch is, as follows:
sgifmcli --init --ibswitch --model --id --switchtype [leaf | spine] |
An example command is, as follows:
# sgifmcli --init --ibswitch --model voltaire-isr-2004 --id isr2004 --switchtype spine |
This configures a Voltaire switch ISR2004 with hostname isr2004 as a spine switch. isr2004 refers to the admin port of the switch and needs to be configured previously to allow for switch access. The switch is now initialized and the root GUID from the spine switches have been downloaded.
From the admin node, add the switch to the fabric. The syntax to add the switch is, as follows:
sgifmcli --add --id <fabric> --switch <hostname> |
An example command is, as follows:
# sgifmcli --add --id ib0 --switch isr2004 |
In this example, ISR2004 is connected to the ib0 fabric.
For the new switch to be activated, the SM master and the optional SM standby need to be (re)started.
# sgifmcli --start --id master_ib0 |
If the SM master was running while the switch was added, you first need to stop and then start the master, as follows:
# sgifmcli --stop --id master_ib0 # sgifmcli --start --id master_ib0 |
The switches related to a particular fabric can be listed, as follows:
# sgifmcli --switchlist --id <fabric> |
This section describes how to configure the lighweight fabric with fat-tree topology using external Mellanox switches.
To configure the Lightweight Fabric, perform the following steps:
The switch should be setup to use dynamic host configuration protocol (DHCP), as part of the initial setup. This is done by SGI in the factory. You only need to go through the process if a new switch is being installed. For configuration information, see the Mellanox Technologies IS5025/5030/5031/5035 Installation Guide. See the section called "Configuring the switch for the First Time". When asked about using DHCP answer "Yes". For IP configuration information, see Table 4 - “Configuration Wizard Session - IP Configuration by DHCP”.
Use the discover command, to discover external switches. See “discover Command” in Chapter 2. The switch model to be used is "mellanox5030". The discover command supports external switches in a manner similar to racks and service nodes, except that switches do not have BMCs and there is no software to install.
Discover all external switches.
Use smc-configure-fabric to configure the fabric, as described in “InfiniBand Management Tool Graphical User Interface”.
In the Configure Topology option, use BFTREE as the topology. The FAT TREE topology option should not be used. Proceed with the steps, described in “InfiniBand Management Tool Graphical User Interface”, to configure and verify the fabric.
After your InfiniBand fabric has been configured and started, you can use the sgifmcli(8) command to verify the health of the fabric.
The fabric can be either ib0 or ib1 . This version of the InfiniBand verifier runs the recommended OFED test suite. In addition, the SMC for Altix ICE cluster view is compared with the InfiniBand cluster view and potential differences are reported.
To verify the ibo fabric, perform the following command:
# sgifmcli --verify --id <fabric> |
For more information, see the sgifmcli(8) man page.
The infiniband-diags-pp package contains useful tools and diagnostic software for Open Fabrics Enterprise Distribution (OFED). This section describes some of these tools. These tools reside on the rack leader controller (leader node) in the /usr/sbin directory. To see a full list of diagnostics, from the leader node, use the following command:
# rpm -ql infiniband-diags-pp | grep "/usr/sbin" |
This section covers the following topics:
You can use the ibstat command to see the current status of the host channel adapters (HCA) in your InfiniBand fabric including the HCAs on rack leader controllers. The following view is prior to starting the fabric management:
r1lead:/usr/bin # ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.600 Hardware version: a0 Node GUID: 0x0008f104039881a8 System image GUID: 0x0008f104039881ab Port 1: State: Initializing Physical state: LinkUp Rate: 20 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0008f104039881a9 Port 2: State: Initializing Physical state: LinkUp Rate: 20 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0008f104039881aa |
The following shows output from the ibstat command after the fabric management software has been started:
r1lead:/opt/sgi/sbin # ibstat CA 'mthca0' CA type: MT25208 (MT23108 compat mode) Number of ports: 2 Firmware version: 4.7.600 Hardware version: a0 Node GUID: 0x0008f104039881a8 System image GUID: 0x0008f104039881ab Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x02510a6a Port GUID: 0x0008f104039881a9 Port 2: State: Active Physical state: LinkUp Rate: 20 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x02510a6a Port GUID: 0x0008f104039881aa |
You can use the ibstatus (less verbose that ibstat) command to show the link rate, as follows:
r1lead:/opt/sgi/sbin # ibstatus Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0008:f104:0398:81a9 base lid: 0x1 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 20 Gb/sec (4X DDR) Infiniband device 'mthca0' port 2 status: default gid: fe80:0000:0000:0000:0008:f104:0398:81aa base lid: 0x1 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 20 Gb/sec (4X DDR) |
Note: If link rate is not 20 Gb/sec 4xDDR, and you have a DDR capable HCA, there is a physical link problem with your system. |
The perfquery command is useful for finding errors on a particular HCA (or a number of them) and switch ports. You can also use perfquery to reset HCA and switch port counters.
To see a usage statement for the perfquery command, perform the following:
r1lead:/opt/sgi/sbin # perfquery --help Usage: perfquery [-d(ebug) -G(uid) -a(ll_ports) -r(eset_after_read) -C ca_name -P ca_port -R(eset_only) -t(imeout) timeout_ms -V(ersion) -h(elp)] [<lid|guid> [[port] [reset_mask]]] Examples: perfquery # read local port's performance counters perfquery 32 1 # read performance counters from lid 32, port 1 perfquery -e 32 1 # read extended performance counters from lid 32, port 1 perfquery -a 32 # read performance counters from lid 32, all ports perfquery -r 32 1 # read performance counters and reset perfquery -e -r 32 1 # read extended performance counters and reset perfquery -R 0x20 1 # reset performance counters of port 1 only perfquery -e -R 0x20 1 # reset extended performance counters of port 1 only perfquery -R -a 32 # reset performance counters of all ports perfquery -R 32 2 0x0fff # reset only error counters of port 2 perfquery -R 32 2 0xf000 # reset only non-error counters of port 2 |
r1lead:/opt/sgi/sbin # perfquery # Port counters: Lid 1 port 1 PortSelect:......................1 CounterSelect:...................0x0000 SymbolErrors:....................0 LinkRecovers:....................0 LinkDowned:......................0 RcvErrors:.......................0 RcvRemotePhysErrors:.............0 RcvSwRelayErrors:................0 XmtDiscards:.....................0 XmtConstraintErrors:.............0 RcvConstraintErrors:.............0 LinkIntegrityErrors:.............0 ExcBufOverrunErrors:.............0 VL15Dropped:.....................0 XmtData:.........................0 RcvData:.........................0 XmtPkts:.........................0 RcvPkts:.........................0 |
The ibnetdiscover command allows you discover the IB fabric.
To see a usage statement for the ibnetdiscover command, perform the following:
r1lead:/opt/sgi/sbin # ibnetdiscover --help Usage: ibnetdiscover [-d(ebug)] -e(rr_show) -v(erbose) -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -V(ersion) -C ca_name -P ca_port -t(imeout) timeout_ms --switch-map switch-map] [<topology-file>] --switch-map <switch-map> specify a switch-map file |
Note: Only abbreviated output is shown in the this example. |
r1lead:/opt/sgi/sbin # ibnetdiscover # # Topology file: generated on Tue Jul 17 14:05:20 2007 # # Max of 3 hops discovered # Initiated from node 0008f104039881a8 port 0008f104039881a9 vendid=0x2c9 devid=0xb924 sysimgguid=0x8006900000000dd ... Switch : 0x08006900000000dc ports 24 devid 0xb924 vendid 0x2c9 "MT47396 Infiniscale-III Mellanox Technologies" Switch : 0x08006900000000a4 ports 24 devid 0xb924 vendid 0x2c9 "MT47396 Infiniscale-III Mellanox Technologies" r1lead:/opt/sgi/sbin # ibnetdiscover -H (HCA's) Ca : 0x0030487aa7940000 ports 1 devid 0x6274 vendid 0x2c9 "MT25204 InfiniHostLx Mellanox Technologies" Ca : 0x0030487aa78c0000 ports 1 devid 0x6274 vendid 0x2c9 "r1i0n8-ib0 HCA-1" Ca : 0x0008f10403988198 ports 2 devid 0x6278 vendid 0x8f1 " HCA-1" Ca : 0x0030487aa7840000 ports 1 devid 0x6274 vendid 0x2c9 "r1i0n1-ib0 HCA-1" Ca : 0x0030487aa79c0000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n0-ib0 HCA-1" Ca : 0x0030487aa7900000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n8-ib0 HCA-1" Ca : 0x0030487aa7980000 ports 1 devid 0x6274 vendid 0x2c9 "r1i1n1-ib0 HCA-1" Ca : 0x0008f104039881a8 ports 2 devid 0x6278 vendid 0x8f1 " HCA-1" ====================================================================================================== |
The ibdiagnet command is a useful diagnostic tool.
To see a usage statement for the ibdiagnet command, perform the following:
r1lead:/opt/sgi/sbin # ibdiagnet --help Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2 NAME ibdiagnet SYNOPSYS ibdiagnet [-c ] [-v] [-r] [-o ] [-t ] [-s ] [-i ] [-p ] [-pm] [-pc] [-P <>] [-lw <1x|4x|12x>] [-ls <2.5|5|10>] DESCRIPTION ibdiagnet scans the fabric using directed route packets and extracts all the available information regarding its connectivity and devices. It then produces the following files in the output directory defined by the -o option (see below): ibdiagnet.lst - List of all the nodes, ports and links in the fabric ibdiagnet.fdbs - A dump of the unicast forwarding tables of the fabric switches ibdiagnet.mcfdbs - A dump of the multicast forwarding tables of the fabric switches ibdiagnet.masks - In case of duplicate port/node Guids, these file include the map between masked Guid and real Guids ibdiagnet.sm - A dump of all the SM (state and priority) in the fabric ibdiagnet.pm - In case -pm option was provided, this file contain a dump of all the nodes PM counters In addition to generating the files above, the discovery phase also checks for duplicate node/port GUIDs in the IB fabric. If such an error is detected, it is displayed on the standard output. After the discovery phase is completed, directed route packets are sent multiple times (according to the -c option) to detect possible problematic paths on which packets may be lost. Such paths are explored, and a report of the suspected bad links is displayed on the standard output. After scanning the fabric, if the -r option is provided, a full report of the fabric qualities is displayed. This report includes: SM report Number of nodes and systems Hop-count information: maximal hop-count, an example path, and a hop-count histogram All CA-to-CA paths traced Credit loop report mgid-mlid-HCAs matching table Note: In case the IB fabric includes only one CA, then CA-to-CA paths are not reported. Furthermore, if a topology file is provided, ibdiagnet uses the names defined in it for the output reports. OPTIONS -c : The minimal number of packets to be sent across each link (default = 10) -v : Instructs the tool to run in verbose mode -r : Provides a report of the fabric qualities -o : Specifies the directory where the output files will be placed (default = /tmp) -t : Specifies the topology file name -s : Specifies the local system name. Meaningful only if a topology file is specified -i : Specifies the index of the device of the port used to connect to the IB fabric (in case of multiple devices on the local system) -p : Specifies the local device's port number used to connect to the IB fabric -pm : Dumps all pmCounters values into ibdiagnet.pm -pc : reset all the fabric links pmCounters -P <>: If any of the provided pm is greater then its provided value, print it to screen -lw <1x|4x|12x> : Specifies the expected link width -ls <2.5|5|10> : Specifies the expected link speed -h|--help : Prints this help information -V|--version : Prints the version of the tool --vars : Prints the tool's environment variables and their values ERROR CODES 1 - Failed to fully discover the fabric 2 - Failed to parse command line options 3 - Failed to interact with IB fabric 4 - Failed to use local device or local port 5 - Failed to use Topology File 6 - Failed to load required Package |
Output which shows no errors means the system is operating correctly:
r1lead:/opt/sgi/sbin # ibdiagnet Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2 Loading IBDM from: /usr/lib64/ibdm1.2 -W- Topology file is not specified. Reports regarding cluster links will use direct routes. -W- A few ports of local device are up. Since port-num was not specified (-p option), port 1 of device 1 will be used as the local port. -I- Discovering the subnet ... 10 nodes (2 Switches & 8 CA-s) discovered. -I--------------------------------------------------- -I- Bad Guids Info -I--------------------------------------------------- -I- No bad Guids were found -I--------------------------------------------------- -I- Links With Logical State = INIT -I--------------------------------------------------- -I- No bad Links (with logical state = INIT) were found -I--------------------------------------------------- -I- PM Counters Info -I--------------------------------------------------- -I- No illegal PM counters values were found -I--------------------------------------------------- -I- Bad Links Info -I--------------------------------------------------- -I- No bad link were found -I- Done. Run time was 0 seconds. |
You can use ibdiagnet to load the fabric to test it, as follows:
r1lead:/opt/sgi/sbin # ibdiagnet -c 5000 Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2 Loading IBDM from: /usr/lib64/ibdm1.2 -W- Topology file is not specified. Reports regarding cluster links will use direct routes. -W- A few ports of local device are up. Since port-num was not specified (-p option), port 1 of device 1 will be used as the local port. -I- Discovering the subnet ... 10 nodes (2 Switches & 8 CA-s) discovered. -I--------------------------------------------------- -I- Bad Guids Info -I--------------------------------------------------- -I- No bad Guids were found -I--------------------------------------------------- -I- Links With Logical State = INIT -I--------------------------------------------------- -I- No bad Links (with logical state = INIT) were found -I--------------------------------------------------- -I- PM Counters Info -I--------------------------------------------------- -I- No illegal PM counters values were found -I--------------------------------------------------- -I- Bad Links Info -I--------------------------------------------------- -I- No bad link were found -I- Done. Run time was 8 seconds. |