![]() |
LAM / MPI Parallel Computing |
[ Mirror sites ]
|
Table of Contents
InstallationPlease see the installation guide for instructions on installing LAM/MPI 6.3Supported SystemsLAM 6.3 has been tested on the following systems:
The LAM Team would greatly appreciate your time and effort in helping to verify LAM/MPI on a wide variety of systems. Please see the LAM test suite page to see how you can help.< New feature overview
Caveats about MPI_CANCELLAM is fully MPI-1 complaint with the exception of MPI_CANCEL. MPI_CANCEL works properly for receives, but will almost never work on sends. MPI_CANCEL is most frequently used with unmatched MPI_IRECV's that were made "in case" a matching message arrived. This simply entails removing the receive request from the local queue, and is fairly straightforward to implement.Actually canceling a send operation is much more difficult because some meta information about a message is usually sent immediately. As such, the message is usually at least partially sent before an MPI_CANCEL is issued. Trying to chase down all the particular cases is a nightmare, to say the least. As such, the LAM Team decided not to implement MPI_CANCEL on sends, and instead concentrate on other features. Backward CompatibilityLAM provides full source code backward compatibility with previous versions of LAM. Old applications that compiled properly with older versions of LAM can simply be recompiled with this version of LAM.Binary compatibility, however, is not provided -- applications that have been compiled with previous versions of LAM will need to be recompiled in order to run properly with this version of LAM. If applications are not re-compiled with this LAM, their behavior will be unpredictable. LAM and LinuxLAM is frequently used on Linux-based machines (iX86 and otherwise). It works correctly under 2.0.36 (we didn't test under 2.0.37, but we have no reason to believe that it wouldn't work under that version as well, since it is really only minor changes from 2.0.36) and 2.2.x.However, versions 2.2.0 through 2.2.9 had some TCP/IP performance problems. It seems that version 2.2.10 fixed these problems; if you are using a Linux version between 2.2.0 and 2.2.9, LAM may exhibit poor C2C performance due to the Linux TCP/IP kernel bugs. We recomend that you upgrade to 2.2.10 (or the latest version). See the LAM Linux page for a full discussion of the problem. LAM help fileThe following LAM binaries have had their help messages greatly expanded:
The messages should be much more helpful in trying to diagnose problems, especially for first-time users. The help messages generally try to identify the problem and suggest solutions. It is possible for multiple error messages to be printed; one failure may cause other failures. As such, the first error message is generally (but not always) the most relevant message -- solving that error may solve the rest. Additionally, much more information is now output when the "-d" switch is used on all of these program (which enables debugging output). The help messages are all contained in a single ASCII file which is initially installed into the following file (where $prefix is the option supplied to --prefix in the ./configure script):
$prefix/share/lam/lam-6.3b-helpfile
The format of the file is simple; simple delimiter lines separate help topic messages. It should be very obvious which message corresponds to which program/topic name combination. This file allows system administrators to customize help messages for their users according to the local environment. When LAM tries to find the helpfile to print out a help message, it actually searches for the file in the following order: $LAMHELPFILE $HOME/lam-helpfile $HOME/lam-6.3b-helpfile $HOME/share/lam/lam-helpfile $HOME/share/lam/lam-6.3b-helpfile $LAMHELPDIR/lam-helpfile $LAMHELPDIR/lam-6.3b-helpfile $LAMHOME/share/lam/lam-helpfile $LAMHOME/share/lam/lam-6.3b-helpfile $TROLLIUSHOME/share/lam/lam-helpfile $TROLLIUSHOME/share/lam/lam-6.3b-helpfile $prefix/share/lam/lam-helpfile $prefix/share/lam/lam-6.3b-helpfile This seemingly-over complicated scheme will allow for maximum flexibility for system administrators and/or users to define the location of customized help files. Zeroing out LAM buffers before useLAM has several structures that are used in many situations. One example is the "struct nmsg"; one of the underlying message constructs used to pass data between LAM entities. But since the "struct nmsg" is used in so many places, it is a generalized structure and contains fields that are not used in every situation. By default, LAM only zeros out relevant struct members before using a structure. "Using" a structure may involve sending the entire structure (including uninitialized members) to a remote host. This is not a problem, because the remote host will also ignore irrelevant struct members (depending on the specific function being invoked). More to the point -- LAM was designed this way to avoid setting variables that will not be used; this is a slight optimization in run-time performance. Memory-checking debuggers are quite popular (such as purify and the Solaris Workshop bcheck program), and quite useful to find memory leaks, indexing past the end of arrays, and other types of Heisenbugs. Since LAM "uses" uninitialized memory, it tends to generate many warnings with these types of debuggers. The --with-purify option has been added to the ./configure script that will force LAM to zero out all memory before it is used. This will eliminate the "read before initialized" types of warnings that memory-checking debuggers will identify deep inside LAM. However, this option invokes a slight overhead penalty in the run-time performance of LAM, so it is not the default. MpirunThe default behavior of mpirun has changed. The default options now correspond to -w -c2c -nger. That is, wait for the application to terminate, use the fast client-to-client communication mode and disable GER. To get the old behavior use the options -lamd -ger -nw.Mpirun now recognizes command lines of the form For example,% mpirun -np <nprocs> {LAM specific mpirun args} \ <program> {program args} runs 4 copies of program /bin/foobar on nodes n0 and n1, passing the arguments, 12 a b c, to the program. The new syntax is equivalent to the following in the "-c" syntax which is still supported.% mpirun -np 4 -lamd n0 n1 /bin/foobar 12 a b c % mpirun -c <nprocs> {LAM specific mpirun args} \ <program> -- {program args} Ability to pass environment variables.All environment variables named LAM_MPI_* are now automatically passed to remote notes (unless disabled via the "-nx" option to mpirun). The "-x" option enabled exporting of specific environment variables to the remote nodes:% LAM_MPI_FOO="green eggs and lam" % export LAM_MPI_FOO % mpirun N -x DISPLAY,ME=author lamIam This will launch the "lamIam" application on all remote nodes. The LAM_MPI_FOO, DISPLAY, and ME variables will be created on all nodes before the user's program is invoked. Note that the parser for the "-x" option is currently not very sophisticated -- it cannot even handle quoted values when defining new environment variables. Users are advised to set variables in the environment prior to invoking mpirun, and only use "-x" to export the variables to the remote nodes (not to define new variables), if possible. Pseudo-tty support.The "-pty" option to mpirun enabled pseudo tty support. Among other things, this gives line-buffered output from the remote nodes (which is probably what you want). It is not currently a default option because it has not been tested in a wide variety of Unixes yet.Ability to change to arbitrary directories.The "-wd" option to mpirun allows the user to change to an arbitrary directory before their program is invoked. It can also be used in application schema files to specify working directories on specific nodes and/or for specific applications.If the "-wd" option appears both in a schema file and on the command line, the schema file directory will override the command line value. Ability to run shell scripts/debuggers/etc.mpirun can now also run non-LAM/MPI programs. That is, one can mpirun a shell script, debugger, or any other program that will eventually either exec a LAM/MPI program or spawn a LAM/MPI program as a child.This is extremely helpful for batch systems and debugging environments. For example: % mpirun N gdb lamexecThe lamexec command has been added to LAM/MPI's repertoire. It is an "mpirun clone", but is specifically for running non-MPI programs. That is, one can do the following:% lamexec N ps which will run "ps" on all nodes in the multicomputer. It can take most of the same command line arguments as mpirun; it does not support the flags that do not make sense for non-MPI programs (e.g., -c2c, -lamd, etc.). See lamexec(1) for more details. hcc / hcp / hf77 / mpicc / mpiCC / mpif77The hcc, hcp, and hf77 wrapper compilers have previously not automatically passed the "-lmpi" option to the underlying compiler. The rationale behind this decision was that the "mpicc" and "mpif77" wrapper compilers added this functionality; the "h" wrappers were intended as Trollius compilers, not LAM/MPI compilers.But hcc, hcp, and hf77 have become the de facto wrapper compilers (vs. mpicc and mpif77). Indeed, some users have been confused about why -lmpi is not implicit to the "h" wrapper compilers. Hence, "-lmpi" is now automatically passed to the underlying compiler in the hcc, hcp, and hf77 wrapper compilers. The mpicc and mpif77 compilers are now symbolic links to hcc and hf77, respectively. For symmetry, mpiCC has been created as a symbolic link to hcp. Root execution disallowedIt is a Very Bad Idea to run the LAM executables as root.LAM was designed to be run by individual users; it was not designed to be run as a root-level service where multiple users use the same LAM daemons in a client-server fashion (see "Typical Usage" in the INSTALL file). LAM should be booted by each individual user who wishes to run MPI programs. There are a wide array of security issues when root runs a service-level daemon; LAM does not even attempt to address any of these issues. Especially with today's propensity for hackers to scan for root-owned network daemons, it could be tragic to run this program as root. While LAM is known to be quite stable, and LAM does not leave network sockets open for random connections after the initial setup, several factors should strike fear into system administrator's hearts if LAM were to be constantly running for all users to utilize:
RPI transport layersLAM 6.2 provides three client-to-client transport layers which implement the request progression interface (RPI). As in LAM 6.1 the LAM daemon RPI transport is always available. It is no longer the default transport and must be explicitly invoked via the -lamd option to mpirun.The three client-to-client transports are:
Signal catchingLAM MPI now catches the signals SEGV, BUS, FPE and ILL. The signal handler terminates the application. This is useful in batch jobs to help ensure that mpirun returns if an application process dies. To disable the catching of signals use the -nsigs option to mpirun.Internal signalThe signal used internally by LAM has been changed from SIGUSR1 to SIGUSR2 to reduce the chance of conflicts with the Linux pthreads library. The signal used is configurable.New basic datatypesSupport has been added for the MPI_LONG_LONG_INT, MPI_UNSIGNED_LONG_LONG and MPI_WCHAR basic datatypes.MPI-2 SupportC++ bindingsC++ bindings for MPI-1 are provided from the MPI-2 C++ bindings package from the University of Notre Dame (http://www.mpi.nd.edu/research/mpi2c++/), version 1.0.3. The MPI-1 C++ bindings are described in Chapter 10 and Appendix B of the MPI-2 standard, which can be found at http://www.mpi-forum.org/.The C++ bindings package is compiled, by default, with LAM, and the LAM wrapper compilers (hcc/hcp/hf77) will automatically do "the right things" to compile/link user programs that use MPI C++ bindings function calls. Note that the C++ bindings have requirements on the degree of conformance that your C++ compiler supports; see the file mpi2c++/README for more details. If your C++ compiler cannot support the requirements of the C++ bindings package, it is safest just to disable MPI C++ bindings support in LAM. MPI C++ bindings support can be disabled via the LAM ./configure script; see the INSTALL file for specific instructions. Please see the "Contact Information" section of the mpi2c++/README file for how to submit questions and bug reports about the MPI 2 C++ bindings package (that do not specifically pertain to LAM). MPI-IO / ROMIOMPI-IO support has been added by including the ROMIO package from Argonne National Labs (http://www.mcs.anl.gov/romio/), version 1.0.1. The MPI-IO functions are described in chapter 9 of the MPI-2 standard, which can be found at http://www.mpi-forum.org/.The ROMIO package can be compiled with LAM, and the LAM wrapper compilers (hcc/hcp/hf77) will automatically do "the right things" to compile/link user programs that use ROMIO function calls. Please note that this is the first version of ROMIO that has been configured to work with LAM. As such, there are some custom modifications that were made to the initial ROMIO distribution of 1.0.1; a vanilla ROMIO 1.0.1 distribution will not compile correctly (conversely, versions of LAM prior to 6.3b will not compile with ROMIO as well -- there were incompatibilities in both directions). The ROMIO modifications have been conveyed back to the ROMIO team; the next release will be able to natively compile with LAM 6.3b (and higher) with possible limitations, mentioned below. ROMIO support can be enabled via the LAM ./configure script; see the INSTALL file for specific instructions. There are some important limitations to ROMIO that are discussed in the romio/README file. One limitation that is not currently listed in the ROMIO README file is that atomic file access will not work with AFS. This is because of file locking problems with AFS. The ROMIO test program "atomicity" will fail if you specify an output file on AFS. Additionally, ROMIO does not support the following LAM functionality:
Inter-language interoperabilityInter-language interoperability is supported. It is now possible to initialize LAM MPI from either C or Fortran and mix MPI calls from both languages.One-sided communicationSupport is provided for get/put/accumulate data transfer operations and for the post/wait/start/complete and fence synchronization operations. No support is provided for window locking.The datatypes used in the get/put/accumulate operations are restricted to being basic datatypes or single level contigs/vectors of basic datatypes. The implementation of the one-sided operations is layered on top of the point-to-point functions and will thus perform no better than them. Nevertheless it is hoped that providing this support will aid developers in developing and debugging codes using one-sided communication. The following functions related to one-sided communication have been implemented.
MPI_Win_create MPI_Win_free MPI_Win_get_group MPI_Get MPI_Put MPI_Accumulate MPI_Win_fence MPI_Win_post MPI_Win_wait MPI_Win_start MPI_Win_complete Dynamic processesThe dynamic process support provided in LAM 6.2 has been extended and the function names changed to conform to the final MPI 2.0 standard. The following functions related to dynamic process support are provided.
MPI_Comm_spawn MPI_Comm_spawn_multiple MPI_Comm_get_parent MPI_Comm_accept MPI_Comm_connect MPI_Comm_disconnect MPI_Comm_join MPI_Lookup_name MPI_Publish_name MPI_Unpublish_name MPI_Open_port MPI_Close_port InfoFull support for info objects is provided.
MPI_Info_create MPI_Info_free MPI_Info_delete MPI_Info_dup MPI_Info_get MPI_Info_get_nkeys MPI_Info_get_nthkey MPI_Info_get_valuelen MPI_Info_set Communicator and window error handlingThe new communicator error handler functions are supported and window error handlers are also supported.
MPI_Comm_create_errhandler MPI_Comm_get_errhandler MPI_Comm_set_errhandler MPI_Win_create_errhandler MPI_Win_get_errhandler MPI_Win_set_errhandler Handle conversionsHandle conversions for inter-language interoperability are supported.
MPI_Comm_f2c MPI_Comm_c2f MPI_Group_f2c MPI_Group_c2f MPI_Type_f2c MPI_Type_c2f MPI_Request_f2c MPI_Request_c2f MPI_Info_f2c MPI_Info_c2f MPI_Win_f2c MPI_Win_c2f MPI_Status_f2c MPI_Status_c2f Attributes on communicators, datatypes and windowsAttributes may now be set on and retrieved from datatypes and windows. The new communicator attribute handling functions are also supported.
MPI_Comm_create_keyval MPI_Comm_free_keyval MPI_Comm_delete_attr MPI_Comm_get_attr MPI_Comm_set_attr MPI_Type_create_keyval MPI_Type_free_keyval MPI_Type_delete_attr MPI_Type_get_attr MPI_Type_set_attr MPI_Win_create_keyval MPI_Win_free_keyval MPI_Win_delete_attr MPI_Win_get_attr MPI_Win_set_attr New derived type constructors and type enquiry functionsSupport has been added for the following new derived type constructors
MPI_Type_create_struct MPI_Type_create_hindexed MPI_Type_create_hvector MPI_Type_dup MPI_Type_create_resized MPI_Type_create_subarray MPI_Type_create_darrayand for the type enquiry functions
MPI_Type_get_contents MPI_Type_get_envelope MPI_Type_get_extent MPI_Type_get_true_extent MiscellaneousImplementations of the following functions are provided. LAM 6.3 reports its MPI version as 1.2.
MPI_Get_version MPI_Get_address |
Questions? Comments? Feedback? Please click here |
This site is located in:![]() Notre Dame, IN, USA |
Copyright ©1996-1999
LAM Team / UND 16-Sep-1999 / 08:29:31 EST |