Chapter 3. SGI Peformance Suite Products Overview

This chapter documents the product components that are supported on the SGI computer systems. (For a list of the products, see Table 3-1.)

Descriptions of the product components are grouped in this chapter as follows:

SGI Performance Suite 1.2 Products

Software provided by SGI for the SGI Performance Suite 1.2 release consists of kernel modules for SGI software built against the kernels in SUSE Linux Enterprise Server 11 SP1 or Red Hat Enterprise Server 6 (RHEL 6) and value-add software developed by SGI. For information on how these products are bundled, see “SGI Performance Suite Software Bundles” in Chapter 1.

Table 3-1. SGI Performance Suite 1.2 Products

Product

Description

Array Services

Array Services includes administrator commands, libraries, daemons, and kernel extensions that support the execution of parallel applications across a number of hosts in a cluster, or array. The Message Passing Interface (MPI) uses Array Services to launch parallel applications. For information on MPI, see the Message Passing Toolkit (MPT) User's Guide

The secure version of Array Services is built to make use of secure sockets layer (SSL) and secure shell (SSH).

For more information on standard Array Services or Secure Array Services (SAS), see the Array Services chapter in the Linux Resource Administration Guide.

Cpuset System

The Cpuset System is primarily a workload manager tool permitting a system administrator to restrict the number of processors and memory resources that a process or set of processes may use. A system administrator can use cpusets to create a division of CPUs and memory resources within a larger system. For more information, see the “Cpusets” chapter in the Linux Resource Administration Guide.

IOC4 serial driver

Driver that supports the Internal IDE CD-ROM, NVRAM, and Real-Time Clock.

Serial ports are supported on the IOC4 base I/O chipset and the following device nodes are created:

/dev/ttyIOC4/0
/dev/ttyIOC4/1
/dev/ttyIOC4/2
/dev/ttyIOC4/3

Kernel partitioning support

Provides the software infrastructure necessary to support a partitioned system, including cross-partition communication support. Partitioning is supported on SGI Altix UV 100 and UV 1000 systems only. For more information on system partitioning, see the SGI Altix UV Linux Configuration and Operations Guide.

MPT

Provides industry-standard message passing libraries optimized for SGI computers. For more information on MPT, see the Message Passing Toolkit (MPT) User's Guide.

NUMA tools

Provides a collection of NUMA related tools (dlook(1), dplace(1), and so on). For more information on NUMA tools, see the Linux Application Tuning Guide for SGI X86-64 Based Systems.

Performance Co-Pilot collector infrastructure

Provides performance monitoring and performance management services targeted at large, complex systems.

REACT real-time for Linux

Support for real-time programs. For more information, see the REACT Real-Time for Linux Programmer's Guide.

Utilities

udev_xsci, is a udev helper for doing XSCSI device names. sgtools, a set of tools for SCSI disks using the Linux SG driver and lsiutil, the LSI Fusion-MPT host adapter management utility.

XVM

Provides software volume manager functionality such as disk striping and mirroring. For more information on XVM, see the XVM Volume Manager Administrator's Guide.

SGI does not support the following:

  • Base Linux software not released by Novell for SLES 11 SP1 or by Red Hat for RHEL 6 or other software not released by SGI.

  • SGI does not provide technical support for the CentOS 6 operating system.

  • Other releases, updates, or patches not released by Novell for SLES 11 SP1, Red Hat for RHEL 6 or by SGI for SGI Performance Suite software.

  • Software patches, drivers, or other changes obtained from the Linux community or vendors other than Novell, Red Hat, and SGI.

  • Kernels recompiled or reconfigured to run with parameter settings or other modules as not specified by Novell or Red Hat and SGI.

  • Unsupported hardware configurations and devices.

Performance Suite Products

SGI Performance Suite provides application acceleration components for software developers and end-users. SGI Accelerate, SGI MPI, SGI REACT, and SGI UPC contain libraries and tools that enable software developers to develop, profile, and tune applications for faster performance. End users benefit from running their applications with the runtime acceleration tools supplied in SGI Accelerate and SGI MPI. This section describes some key components.

Cpuset Support

The cpuset facility is primarily a workload manager tool permitting a system administrator to restrict the number of processors and memory resources that a process or set of processes may use. A cpuset defines a list of CPUs and memory nodes. A process contained in a cpuset may only execute on the CPUs in that cpuset and may only allocate memory on the memory nodes in that cpuset. Essentially, cpusets provide you with a CPU and memory containers or "soft partitions" within which you can run sets of related tasks. Using cpusets on an SGI system improves cache locality and memory access times and can substantially improve an applications performance and runtime repeatability. Restraining all other jobs from using any of the CPUs or memory resources assigned to a critical job minimizes interference from other jobs on the system. For example, Message Passing Interface (MPI) jobs frequently consist of a number of threads that communicate using message passing interfaces. All threads need to be executing at the same time. If a single thread loses a CPU, all threads stop making forward progress and spin at a barrier. Cpusets can eliminate the need for a gang scheduler.

Cpusets are represented in a hierarchical virtual file system. Cpusets can be nested and they have file-like permissions.

In addition to their traditional use to control the placement of jobs on the CPUs and memory nodes of a system, cpusets also provide a convenient mechanism to control the use of Hyper-Threading Technology.

For detailed information on cpusets, see Chapter 6, “Cpusets on Linux” in the Linux Resource Administration Guide. Information about cpusets is also available in Chapter 5, “Data Placement Tools” in the Linux Application Tuning Guide for SGI X86-64 Based Systems.

Partitioning

SGI provides the ability to divide a single SGI Altix UV 100 or Altix UV 1000 system into a collection of smaller system partitions. Each partition runs its own copy of the operating system kernel and has its own system console, root filesystem, IP network address, and physical memory. All partitions in the system are connected via the SGI high-performance NUMAlink interconnect, just as they are when the system is not partitioned. Thus, a partitioned system can also be viewed as a cluster of nodes connected via NUMAlink.

Benefits of partitioning include fault containment and the ability to use the NUMAlink interconnect and global shared memory features of the SGI systems to provide high-performance clusters.

For further documentation and details on partitioning, see the SGI Altix UV Systems Linux Configuration and Operations Guide.

I /O Subsystems

Although some HPC workloads might be mostly CPU bound, others involve processing large amounts of data and require an I/O subsystem capable of moving data between memory and storage quickly, as well as having the ability to manage large storage farms effectively. The XFS filesystem, XVM volume manager, and data migration facilities were leveraged from IRIX and ported to provide a robust, high-performance, and stable storage I/O subsystem on Linux. This section covers the following topics:

XFS Filesystem

The SGI XFS filesystem provides a high-performance filesystem for Linux. XFS is an open-source, fast recovery, journaling filesystem that provides direct I/O support, space preallocation, access control lists, quotas, and other commercial file system features. Although other filesystems are available on Linux, performance tuning and improvements leveraged from IRIX make XFS particularly well suited for large data and I/O workloads commonly found in HPC environments.

For more information on the XFS filesystem, see XFS for Linux Administration.

XVM Volume Manager

The SGI XVM Volume Manager provides a logical organization to disk storage that enables an administrator to combine underlying physical disk storage into a single logical unit, known as a logical volume. Logical volumes behave like standard disk partitions and can be used as arguments anywhere a partition can be specified.

A logical volume allows a filesystem or raw device to be larger than the size of a physical disk. Using logical volumes can also increase disk I/O performance because a volume can be striped across more than one disk. Logical volumes can also be used to mirror data on different disks.

This release adds a new XVM multi-host failover feature. For more information on this new feature and XVM Volume Manager in general, see the XVM Volume Manager Administrator's Guide.

HPC Application Tools and Support

SGI has ported HPC libraries, tools, and software packages from IRIX to Linux to provide a powerful, standards-based system using Linux and Xeon-based solutions for HPC environments. The following sections describe some of these tools, libraries, and software.

Message Passing Toolkit

The SGI Message Passing Toolkit (MPT) provides industry-standard message passing libraries optimized for SGI computers. On Linux, MPT contains MPI and SHMEM APIs, which transparently utilize and exploit the low-level capabilities within SGI hardware, such as memory mapping within and between partitions for fast memory-to-memory transfers and the hardware memory controller's fetch operation (fetchop) support. Fetchops and other shared memory techniques enable ultra fast communication and synchronization between MPI processes in a parallel application.

MPI jobs can be launched, monitored, and controlled across a cluster or partitioned system using the SGI Array Services software. Array Services provides the notion of an array session, which is a set of processes that can be running on different cluster nodes or system partitions. Array Services is implemented using Process Aggregates (PAGGs), which is a kernel module that provides process containers. PAGGs has been open-sourced by SGI for Linux.

For more information on the Message Passing Toolkit, see the Message Passing Toolkit (MPT) User's Guide.

Performance Co-Pilot

The SGI Performance Co-Pilot software was ported from IRIX to Linux to provide a collection of performance monitoring and performance management services targeted at large, complex systems. Integrated with the low-level performance hardware counters and with MPT, Performance Co-Pilot provides such services as CPU, I/O, and networking statistics; visualization tools; and monitoring tools.

NUMA Data Placement Tools

This section describes the commands that are currently provided with the collection of NUMA related data placement tools that can help you with tuning applications on your system.


Note: Performance tuning information for single processor and multiprocessor programs resides in Linux Application Tuning Guide for SGI X86-64 Based Systems.


dlook Command

The dlook(1)  command displays the memory map and CPU use for a specified process. The following information is printed for each page in the virtual address space of the process:

  • The object that owns the page (file, SYSV shared memory, device driver, and so on)

  • Type of page (RAM, FETCHOP, IOSPACE, and so on)

  • If RAM memory, the following information is supplied:

    • Memory attributes (SHARED, DIRTY, and so on)

    • Node on which that the page is located

    • Physical address of page (optional)

Optionally, the amount of elapsed CPU time that the process has executed on each physical CPU in the system is also printed.

dplace Command

The dplace(1) command binds a related set of processes to specific CPUs or nodes to prevent process migrations. In some cases, this tool improves performance because of the occurrence of a higher percentage of memory accesses to the local node.

taskset Command

The taskset(1) command is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new command with a given CPU affinity. CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. Note that the Linux scheduler also supports natural CPU affinity; the scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications.

For more information on NUMA tools, see Chapter 5, “Data Placement Tools” in the Linux Application Tuning Guide for SGI X86-64 Based Systems.