This chapter provides answers to some common problems users encounter when starting to use SGI MPI, as well as answers to other frequently asked questions. It covers the following topics:
“What are some things I can try to figure out why mpirun is failing? ”
“My code runs correctly until it reaches MPI_Finalize() and then it hangs.”
“My hybrid code (using OpenMP) stalls on the mpirun command.”
“I keep getting error messages about MPI_REQUEST_MAX being too small.”
“I am not seeing stdout and/or stderr output from my MPI application.”
“Where can I find more information about the SHMEM programming model? ”
“The ps(1) command says my memory use (SIZE) is higher than expected. ”
“Why do I see “stack traceback” information when my MPI job aborts?”
Here are some things to investigate:
Look in /var/log/messages for any suspicious errors or warnings. For example, if your application tries to pull in a library that it cannot find, a message should appear here. Only the root user can view this file.
Be sure that you did not misspell the name of your application.
To find dynamic link errors, try to run your program without mpirun. You will get the “mpirun must be used to launch all MPI applications" message, along with any dynamic link errors that might not be displayed when the program is started with mpirun.
As a last resort, setting the environment variable LD_DEBUG to all will display a set of messages for each symbol that rld resolves. This produces a lot of output, but should help you find the cause of the link arror.
Be sure that you are setting your remote directory properly. By default, mpirun attempts to place your processes on all machines into the directory that has the same name as $PWD. This should be the common case, but sometimes different functionality is required. For more information, see the section on $MPI_DIR and/or the -dir option in the mpirun man page.
If you are using a relative pathname for your application, be sure that it appears in $PATH. In particular, mpirun will not look in '.' for your application unless '.' appears in $PATH.
Run /usr/sbin/ascheck to verify that your array is configured correctly.
Use the mpirun -verbose option to verify that you are running the version of MPI that you think you are running.
Be very careful when setting MPI environment variables from within your .cshrc or .login files, because these will override any settings that you might later set from within your shell (due to the fact that MPI creates the equivalent of a fresh login session for every job). The safe way to set things up is to test for the existence of $MPI_ENVIRONMENT in your scripts and set the other MPI environment variables only if it is undefined.
If you are running under a Kerberos environment, you may experience unpredictable results because currently, mpirun is unable to pass tokens. For example, in some cases, if you use telnet to connect to a host and then try to run mpirun on that host, it fails. But if you instead use rsh to connect to the host, mpirun succeeds. (This might be because telnet is kerberized but rsh is not.) At any rate, if you are running under such conditions, you will definitely want to talk to the local administrators about the proper way to launch MPI jobs.
Look in /tmp/.arraysvcs on all machines you are using. In some cases, you might find an errlog file that may be helpful.
You can increase the verbosity of the Array Services daemon (arrayd) using the -v option to generate more debugging information. For more information, see the arrayd(8) man page.
Check error messages in /var/run/arraysvcs.
This is almost always caused by send or recv requests that are either unmatched or not completed. An unmatched request is any blocking send for which a corresponding recv is never posted. An incomplete request is any nonblocking send or recv request that was never freed by a call to MPI_Test(), MPI_Wait(), or MPI_Request_free().
Common examples are applications that call MPI_Isend() and then use internal means to determine when it is safe to reuse the send buffer. These applications never call MPI_Wait() . You can fix such codes easily by inserting a call to MPI_Request_free() immediately after all such isend operations, or by adding a call to MPI_Wait() at a later place in the code, prior to the point at which the send buffer must be reused.
If your application was compiled with the Open64 compiler, make sure you follow the instructions about using the Open64 compiler in combination with MPI/OpenMP applications descibed in “Compiling and Linking MPI Programs” in Chapter 3.
There are two types of cases in which the MPI library reports an error concerning MPI_REQUEST_MAX. The error reported by the MPI library distinguishes these.
MPI has run out of unexpected request entries; the current allocation level is: XXXXXX |
The program is sending so many unexpected large messages (greater than 64 bytes) to a process that internal limits in the MPI library have been exceeded. The options here are to increase the number of allowable requests via the MPI_REQUEST_MAX shell variable, or to modify the application.
MPI has run out of request entries; the current allocation level is: MPI_REQUEST_MAX = XXXXX |
You might have an application problem. You almost certainly are calling MPI_Isend() or MPI_Irecv() and not completing or freeing your request objects. You need to use MPI_Request_free(), as described in the previous section.
All stdout and stderr is line-buffered, which means that mpirun does not print any partial lines of output. This sometimes causes problems for codes that prompt the user for input parameters but do not end their prompts with a newline character. The only solution for this is to append a newline character to each prompt.
You can set the MPI_UNBUFFERED_STDIO environment variable to disable line-buffering. For more information, see the MPI(1) and mpirun(1) man pages.
MPT RPMs are included in the SGI Performance Suite releases. In addition, you can obtain MPT RPMs from the SGI Support website at
http://support.sgi.com |
At MPI job start-up, MPI calls the SHMEM library to cross-map all user static memory on all MPI processes to provide optimization opportunities. The result is large virtual memory usage. The ps(1) command's SIZE statistic is telling you the amount of virtual address space being used, not the amount of memory being consumed. Even if all of the pages that you could reference were faulted in, most of the virtual address regions point to multiply-mapped (shared) data regions, and even in that case, actual per-process memory usage would be far lower than that indicated by SIZE.
This message means that something happened while mpirun was trying to launch your application, which caused it to fail before all of the MPI processes were able to handshake with it.
The mpirun command directs arrayd to launch a master process on each host and listens on a socket for those masters to connect back to it. Since the masters are children of arrayd, arrayd traps SIGCHLD and passes that signal back to mpirun whenever one of the masters terminates. If mpirun receives a signal before it has established connections with every host in the job, it knows that something has gone wrong.
In general, the rule to follow is to run mpirun on your tool and then the tool on your application. Do not try to run the tool on mpirun. Also, because of the way that mpirun sets up stdio, seeing the output from your tool might require a bit of effort. The most ideal case is when the tool directly supports an option to redirect its output to a file. In general, this is the recommended way to mix tools with mpirun . Of course, not all tools (for example, dplace) support such an option. However, it is usually possible to make it work by wrapping a shell script around the tool and having the script do the redirection, as in the following example:
> cat myscript #!/bin/sh setenv MPI_DSM_OFF dplace -verbose a.out 2> outfile > mpirun -np 4 myscript hello world from process 0 hello world from process 1 hello world from process 2 hello world from process 3 > cat outfile there are now 1 threads Setting up policies and initial thread. Migration is off. Data placement policy is PlacementDefault. Creating data PM. Data pagesize is 16k. Setting data PM. Creating stack PM. Stack pagesize is 16k. Stack placement policy is PlacementDefault. Setting stack PM. there are now 2 threads there are now 3 threads there are now 4 threads there are now 5 threads |