DESCRIPTION
Parallel environments are parallel programming and runtime environments
allowing for the execution of shared memory or distributed memory par-
allelized applications. Parallel environments usually require some kind
of setup to be operational before starting parallel applications.
Examples for common parallel environments are shared memory parallel
operating systems and the distributed memory environments Parallel Vir-
tual Machine (PVM) or Message Passing Interface (MPI).
sge_pe allows for the definition of interfaces to arbitrary parallel
environments. Once a parallel environment is defined or modified with
the -ap or -mp options to qconf(1) and linked with one or more queues
via pe_list in queue_conf(5) the environment can be requested for a job
via the -pe switch to qsub(1) together with a request of a range for
the number of parallel processes to be allocated by the job. Additional
-l options may be used to specify the job requirement to further
detail.
Note, Univa Grid Engine allows backslashes (\) be used to escape new-
line (\newline) characters. The backslash and the newline are replaced
with a space (" ") character before any interpretation.
FORMAT
The format of a sge_pe file is defined as follows:
pe_name
The name of the parallel environment as defined for pe_name in
sge_types(1). To be used in the qsub(1) -pe switch.
slots
The number of parallel processes being allowed to run in total under
the parallel environment concurrently. Type is number, valid values
are 0 to 9999999.
user_lists
A comma separated list of user access list names (see access_list(5)).
Each user contained in at least one of the enlisted access lists has
access to the parallel environment. If the user_lists parameter is set
to NONE (the default) any user has access being not explicitly excluded
via the xuser_lists parameter described below. If a user is contained
both in an access list enlisted in xuser_lists and user_lists the user
is denied access to the parallel environment.
xuser_lists
The xuser_lists parameter contains a comma separated list of so called
user access lists as described in access_list(5). Each user contained
in at least one of the enlisted access lists is not allowed to access
the parallel environment. If the xuser_lists parameter is set to NONE
(the default) any user has access. If a user is contained both in an
access list enlisted in xuser_lists and user_lists the user is denied
access to the parallel environment.
and stop procedures) to constitute a command line:
$pe_hostfile
The pathname of a file containing a detailed description of the
layout of the parallel environment to be setup by the start-up
procedure. Each line of the file refers to a host on which par-
allel processes are to be run. The first entry of each line
denotes the hostname, the second entry the number of parallel
processes to be run on the host, the third entry the name of the
queue, and the fourth entry a processor range to be used in case
of a multiprocessor machine.
$host The name of the host on which the start-up or stop procedures
are started.
$job_owner
The user name of the job owner.
$job_id
Univa Grid Engine's unique job identification number.
$job_name
The name of the job.
$pe The name of the parallel environment in use.
$pe_slots
Number of slots granted for the job.
$processors
The processors string as contained in the queue configuration
(see queue_conf(5)) of the master queue (the queue in which the
start-up and stop procedures are started).
$queue The cluster queue of the master queue instance.
stop_proc_args
The invocation command line of a shutdown procedure for the parallel
environment. The shutdown procedure is invoked by sge_shepherd(8) after
the job script has finished. Its purpose is to stop the parallel envi-
ronment and to remove it from all participating systems. An optional
prefix "user@" specifies the user under which this procedure is to be
started. The standard output of the stop procedure is also redirected
to the file REQUEST.poJID in the job's working directory (see qsub(1)),
with REQUEST being the name of the job as displayed by qstat(1) and JID
being the job's identification number. Likewise, the standard error
output is redirected to REQUEST.peJID
The same special variables as for start_proc_args can be used to con-
stitute a command line.
allocation_rule
The allocation rule is interpreted by the scheduler thread and helps
has to be allocated on a single host (no matter which value
belonging to the range is finally chosen for the job to be
allocated).
$fill_up: Starting from the best suitable host/queue, all available
slots are allocated. Further hosts and queues are "filled up"
as long as a job still requires slots for parallel tasks.
$round_robin:
From all suitable hosts a single slot is allocated until all
tasks requested by the parallel job are dispatched. If more
tasks are requested than suitable hosts are found, allocation
starts again from the first host. The allocation scheme
walks through suitable hosts in a best-suitable-first order.
control_slaves
This parameter can be set to TRUE or FALSE (the default). It indicates
whether Univa Grid Engine is the creator of the slave tasks of a paral-
lel application via sge_execd(8) and sge_shepherd(8) and thus has full
control over all processes in a parallel application, which enables
capabilities such as resource limitation and correct accounting. How-
ever, to gain control over the slave tasks of a parallel application, a
sophisticated PE interface is required, which works closely together
with Univa Grid Engine facilities. Such PE interfaces are available
through your local Univa Grid Engine support office.
Please set the control_slaves parameter to false for all other PE
interfaces.
job_is_first_task
The job_is_first_task parameter can be set to TRUE or FALSE. A value of
TRUE indicates that the Univa Grid Engine job script already contains
one of the tasks of the parallel application (the number of slots
reserved for the job is the number of slots requested with the -pe
switch), while a value of FALSE indicates that the job script (and its
child processes) is not part of the parallel program (the number of
slots reserved for the job is the number of slots requested with the
-pe switch + 1).
If wallclock accounting is used (execd_params ACCT_RESERVED_USAGE
and/or SHARETREE_RESERVED_USAGE set to TRUE) and control_slaves is set
to FALSE, the job_is_first_task parameter influences the accounting for
the job: A value of TRUE means that accounting for cpu and requested
memory gets multiplied by the number of slots requested with the -pe
switch, if job_is_first_task is set to FALSE, the accounting informa-
tion gets multiplied by number of slots + 1.
urgency_slots
For pending jobs with a slot range PE request the number of slots is
not determined. This setting specifies the method to be used by Univa
Grid Engine to assess the number of slots such jobs might finally get.
amount. If no upper bound is specified with the range the
absolute maximum possible due to the PE's slots setting is
assumed.
avg: The average of all numbers occurring within the job's PE
range request is assumed.
accounting_summary
This parameter is only checked if control_slaves (see above) is set to
TRUE and thus Univa Grid Engine is the creator of the slave tasks of a
parallel application via sge_execd(8) and sge_shepherd(8). In this
case, accounting information is available for every single slave task
started by Univa Grid Engine.
The accounting_summary parameter can be set to TRUE or FALSE. A value
of TRUE indicates that only a single accounting record is written to
the accounting(5) file, containing the accounting summary of the whole
job including all slave tasks, while a value of FALSE indicates an
individual accounting(5) record is written for every slave task, as
well as for the master task.
Note: When running tightly integrated jobs with SHARE-
TREE_RESERVED_USAGE set, and with having accounting_summary enabled in
the parallel environment, reserved usage will only be reported by the
master task of the parallel job. No per parallel task usage records
will be sent from execd to qmaster, which can significantly reduce load
on qmaster when running large tightly integrated parallel jobs.
RESTRICTIONS
Note, that the functionality of the start-up, shutdown and signaling
procedures remains the full responsibility of the administrator config-
uring the parallel environment. Univa Grid Engine will just invoke
these procedures and evaluate their exit status. If the procedures do
not perform their tasks properly or if the parallel environment or the
parallel application behave unexpectedly, Univa Grid Engine has no
means to detect this.
SEE ALSO
sge_intro(1), sge__types(1), qconf(1), qdel(1), qmod(1), qsub(1),
access_list(5), sge_qmaster(8), sge_shepherd(8).
COPYRIGHT
See sge_intro(1) for a full statement of rights and permissions.
UGE 8.0.0 $Date: 2009/04/06 15:31:32 $ SGE_PE(5)
Man(1) output converted with
man2html