r4 - 21 Oct 2013 - 07:59:53 - MartinCYou are here: TWiki >  Sandbox Web > Arc2Docs > Arc2Batch

Using the batch submission system on Arc2

When you first log in to Arc2, you will be directed to one of several login nodes. These allow regular command line access to the system which is necessary for the setup of runs, compiling code and some analysis work. Login nodes are shared amongst all who are logged in, therefore these systems will get very quickly overloaded if they are used for regular computation.

The compute power behind the system is accessible through a batch submission system. When a job executes through the batch system, processors on the back-end are made available exclusively for the purposes of running the job.

The batch queue system installed is Son of Grid Engine version 8.1.1, plus locally developed and implemented patches.

In order to interact with the batch system the user must give some indication of the resources they require. At a minimum these include:

how long the job needs to run for on how many processors (assumed 1 unless otherwise told)

With this information, the scheduler is able to dispatch the jobs at some point in the future when the resources become available. A fair-share policy is in operation to guide the scheduler towards allocating resources fairly between different faculties.

1. Resource reservation and backfill

By default all jobs are eligible for resource reservation, in that the scheduler will ensure the highest priority jobs will have their start times booked in the future. The qsched -a command can be used to generate a list of the anticipated start times of these jobs. At the moment, only the top 128 jobs are considered for resource reservation. The system will backfill jobs if they will start and finish before the highest priority jobs are scheduled to start. Therefore indicating a realistic runtime for a job (rather than the queue maximum) will make short jobs eligible to be backfilled, potentially shortening their wait-time.

There is also a facility to book an amount of HPC resource for some time in the future, through advance reservation. Jobs eligible to run in that reservation can then be submitted to run within it. Advance reservation is not enabled for users by default, however these reservations can be enabled upon request provided there is a valid case for their use and the fairness policies allow it.

2. Queue configuration

Currently the facility is configured with a single general access queue, allowing submission to all available compute resources. Thus, there is no need to specify a queue name in job submissions.

2.1. Time limits

Jobs requesting a time up to the maximum runtime of the queue are eligible to be run. At the moment the maximum runtime is 48 hours.

Should a job run beyond the length of time requested, it will be killed by the queuing system. To change the time requested by a batch job, change the time specified in the -l h_rt flag e.g.:

$ qsub -l h_rt=6:00:00 script.sh

Will request six hours of runtime.

3. Memory usage

In order that programs do not compete for the available memory in a machine, memory usage is consumable. This helps ensure that if one job is consuming 60GB memory on a node that has total of 64GB memory, the maximum total size of all other jobs which are allowed to execute on that system is 4GB.

By default, a 1GB per process (or 1GB per slot) limit is defined for all batch jobs. To override this behaviour use the -l h_vmem switch to qsub. E.g. to run a 1 process code using 6GB of memory for 6 hours:

$ qsub -l h_vmem=6G -l h_rt=6:00:00 script.sh

As memory is specified per slot:

$ qsub -l h_vmem=2G -l h_rt=6:00:00 -pe smp 4 script.sh Will request a total of 8GB of memory, shared between 4 processes.

Jobs will be run on nodes, provided that the total memory requested per node does not exceed the physical memory of that node. Please note that if a job requests more memory than is physically available the job will not run though it will still show up in the queue. If an executing program exceeds the memory it requested, it will be automatically terminated by the queuing system.

NB: we have modified the scheduler to make a better measurement of memory usage then on many other HPC clusters which are also running the Grid Engine batch scheduler. You may find that jobs require less h_vmem to run on Arc2 than on other machines you may have used.

Arc2 has 316 nodes (5,056 cores) each with a total of 64 GB of memory. There are an additional 16 nodes (256 cores) each with a total of 256 GB of memory. To access the later you need to use the

-l node_type=16core-256G flag. Please see Submitting Jobs for more details. * **CHANGE** *

4. Job submission

The general command to submit a job with the qsub command is as follows:

$ qsub [options] script_file_name [--script-args]

where script_file_name is a file containing commands to executed by the batch request.

* **CHANGE** * For commonly used options and more details about qsub please look at Submitting Jobs page. * **CHANGE** *

For example submission scripts please look at these script examples.

4.1. Submitting shared-memory parallel jobs

Shared memory parallel jobs are jobs that run multiple threads or processes on a single multi-core machine. For instance OpenMP programs are shared memory parallel jobs.

There is a shared memory parallel environment (pe) called smp that is set up to enable the submission of these type of jobs. The option needed to submit this type of job is:

-pe smp

For example:

$ qsub -l h_rt=6:00:00 -pe smp 4 script.sh will request 4 processes in a shared memory processor running for 6 hours.

4.2. Submitting distributed parallel jobs

This type of parallel job runs multiple processes over multiple processors, either on the same machine or more commonly over multiple machines.

A significant change made to the batch system on Arc2, is that in addition to the standard Grid Engine submission syntax, we have also implemented an alternative "nodes" syntax. This is designed to give jobs dedicated access to entire nodes. This should provide more predicable job performance, for instance due to placement and dedicated use of InfiniBand? cards as well as providing a more flexible specification of processes or threads for mixed-mode programming.

It can take either of the following forms:

-l nodes=[,ppn=][,tpp=] -l np=[,ppn=][,tpp=]


w number of nodes requested
y number of processes per node (rewrites MPI hostfile to this)
z number of threads per process (sets OMP_NUM_THREADS to this)

If y and z are omitted, Grid Engine sets y = number of cores in each machine and z = 1.

If y is present and z omitted, Grid Engine sets z = int(num cores / y).

If z is present and y omitted, Grid Engine sets y = int(num cores / z).

If using this syntax, the amount of memory available to the job on each node is automatically set to the node_type specification (i.e. 64G by default).

These options also support mixed mode (MPI+OpenMP) programming.

In addition, the standard Grid Engine method for requesting the number of cores is applicable via use the parallel environment, in this instance pe ib . So the option needed would be:

= -pe ib = 5. Querying queues

The qstat command may be used to display information on the current status of Grid Engine jobs and queues. The basic format for this command is:

$ qstat [switches] Important switches are as follows:

Switch Action -help Prints a list of all options. -f Prints full display output of all queues -g c Print a 'cluster queue' summary - good for understanding what resources are free, across different queue types -g t Print 'traditional' output, i.e. print a line per queue used, rather than a line per job -u username Displays all jobs for a particular username. The switches are documented in the man pages; for example, to check all options for the qstat command type:

$ man qstat By default, users will only see their jobs in the qstat output. To see all jobs use a username wildcard:

$ qstat -u \* 6. Job deletion

To delete a job from the queues issue the following command:

$ qdel jobid where jobid is a number referring to the specified job (available from qstat). To force action for running jobs issue the following command:

$ qdel -f jobid A user can delete all their jobs from the batch queues through the command:

$ qdel -u username

7. Further Help

Commonly used qsub options. Script examples.

-- MartinC - 15 Oct 2013

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r4 < r3 < r2 < r1 | More topic actions

tip TWiki Tip of the Day
Comment box with CommentPlugin
The CommentPlugin allows users to quickly post comments to a page without an edit/preview/save cycle ... Read on Read more
HPC at Leeds University
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback