Chapter 5. Troubleshooting

This section discusses the following:

Forgotten Password or Corrupt Password File

If you forget the administrator password or if the Alerts Page reports that the /etc/appman/passwd file is corrupt (preventing administrator login) run the following to set a new password of your choice (NEWPASSWORD):

# echo "appman_admin:`echo -n NEWPASSWORD | md5sum | cut -d' ' -f1`" > /etc/appman/passwd

The archives Directory is Too Large

Appliance Manager stores historical information in the directory /var/lib/appman/archives. On a large machine, this directory may require too much disk space to fit in the / or /var filesystem. This directory can be moved to any other filesystem (assuming the new filesystem always remains mounted) using the following procedure:

  1. Stop Appliance Manager:

    # service appman stop

  2. Stop Performance Co-Pilot (PCP):

    # service pcp stop

  3. Change to the appman directory:

    # cd /var/lib/appman

  4. Move the archives directory to a different filesystem:

    # mv archives /some/other/filesystem/

  5. Link the other filesystem to the archives location:

    # ln -s /some/other/fs/archives archives

  6. Restart PCP:

    # service pcp start

  7. Restart Appliance Manager:

    # service appman start

Unconfigured Storage Arrays are Discovered

Under no circumstance should you try to discover all the storage arrays available via the public network from the system running Appliance Manager. If you do this, Appliance Manager will assume you want it to manage all discovered storage arrays, which may lead to undesired consequences.

If you have inadvertently auto-discovered storage arrays that you do not wish to manage via Appliance Manager, run the TPSSM, ISSM EE, ISSM WE, or SMI GUI and explicitly remove the undesired storage arrays.

Filesystem Creation Warning Messages

If you attempt to create a filesystem that will result in less than peak performance, you will get a warning message from Appliance Manager. This can occur if you attempt to create a filesystem that spans multiple arrays with different numbers or sizes of disks, or includes disks that are already in use on one array but not on another. For more information, see “Multi-Array Filesystems” in Appendix A.

Power Outage and iSCSI

Due to the nature of iSCSI as a block-level protocol (as distinct from file-level protocols such as NFS and CIFS), particular care must be taken in the event of a system crash, power failure, or extended network outage.

If power is lost to the server while an iSCSI initiator is performing a write to an iSCSI target, the write will not be completed and the filesystem created on that particular target may then be in an inconsistent state. The iSCSI initiator should be made to perform a filesystem check on the iSCSI target immediately after power is restored, and before trying to access that target for normal usage.

For example, on a Windows client:

  1. Use the iSCSI Initiator program to connect to the iSCSI target.

  2. Open My Computer.

  3. Right-click the iSCSI target drive and select Properties.

  4. In the Properties window, select the Tools tab and click the Check Now button.

  5. In the Check Disk window, select both Automatically fix file system errors and Scan for and attempt recovery of bad sectors.

  6. Click Start to verify the filesystem and attempt recovery of any errors.

Users and Groups Not Visible

If you ran Appliance Manager 4.0 and you added local users or groups to Appliance Manager, these users and groups may no longer be visible in the GUI due to changes in the minimum user ID number. User accounts in the range 100 through 999 will continue to work, but you cannot manipulate them with Appliance Manager.

CXFS Status is Incorrect

Appliance Manager might display incorrect status for the CXFS clients. To restore the correct status, see the information about cxfs_admin and restarting the fs2d quorum master in the CXFS general release notes.

CXFS Client Stuck on Filesystems Mount

If the client appears stuck on Mounted 0 of 1 filesystems for an extended period of time, this indicates there is a problem. In this case, do the following:

  1. Check the status of the metadata server and the other clients. If other nodes are stable, it indicates that the filesystem and RAID are operating correctly and have been mounted by those other nodes.

  2. Check the CXFS log file on the client for mounting-related errors. For example:

    cis_fs_mount ERROR: Illegal logbsize: 64 (x == 16k or 32k)
    cis_fs_mount ERROR: logbsize must be multiple of BBSIZE: 64
    op_failed ERROR: Mount failed for data3 (data3 on /mnt/data3)

    In this example, the client is unable to mount the filesystem due to one of the filesystem's mount options. In this case, you must use cxfs_admin to adjust the filesystem's mount options appropriately.

  3. If no other nodes are stable (that is, all are trying to mount the filesystem and have been stuck in that state for an extended period), check the Appliance Manager Alerts page and the CXFS log files on the metadata server.

    See the following for more information about CXFS log files and tools:

    • CXFS 5 Administration Guide for SGI InfiniteStorage

    • CXFS 5 Client-Only Guide for SGI InfiniteStorage

Appliance Manager is Inaccessible when the System Must Be Rebooted

If you must reboot the system but Appliance Manager is inaccessible, do the following:

  1. Log in via the system console as root, such as via the L2 on an Altix ia64 system or via IPMI or a monitor/keyboard on an Altix XE x86_64 system.

  2. Reboot the system:

    # reboot

Appliance Manager is Inaccessible due to Network Configuration Issues

If the network configuration is damaged or if the system running Appliance Manager becomes inaccessible via the network, due the following:

  1. Log in via the system console as root, such as via the L2 on an Altix ia64 system or via IPMI or a monitor/keyboard on an Altix XE x86_64 system.

  2. Reconfigure the management interface ( eth0) by using the following commands, as appropriate for your site:

    • Static IP address:

      /usr/lib/appman/appman-cli -c "network if-enable-static eth0 IPaddress 255.255.255.0"

      For example, for a static IP address of 192.168.9.9:

      # /usr/lib/appman/appman-cli -c "network if-enable-static eth0 192.168.9.9 255.255.255.0"

    • DHCP:

      # /usr/lib/appman/appman-cli -c "if-enable-dhcp eth0"

  3. To set the default gateway (such as if the system must communicate with other systems outside the local network or if the default gateway is not supplied by a DHCP server), enter the following:

    /usr/lib/appman/appman-cli -c "network default-gateway-set Default_Gateway_IPaddress"

    For example, for a default gateway of 192.168.9.254:

    # /usr/lib/appman/appman-cli -c "network default-gateway-set 192.168.9.254"

  4. Reset eth0:

    # /usr/lib/appman/appman-cli -c "network if-reset eth0"

  5. Restart the Appliance Manager service:

    # service appman restart

Reporting Problems to SGI

See “Support Data” in Chapter 3 for information about gathering the information that SGI Support will require when diagnosing problems.