Particle Physics Linux Batch Farm
The batch farm is accessable from the interactive login nodes pplxintn.
The systems are currently being upgraded from SL4 to SL5 so we have two similar setups one running SL4, which is associated with pplxint1 and 2, and the other larger cluster running SL5 associated with pplxint5 and 6.
The worker nodes are a mixture of dual Intel Clovertown Quad core E5345 2.33GHz,
dual Intel Harpertown Quad core E5420 2.5GHz CPU's and
the most recent AMD Opteron 6128, eight core CPUs.
The nodes are configured with 2GB of ram per cpu core and approximately 50GB of local scratch disk per cpu core.
Interactive logins to the worker nodes is disabled.
The batch queuing system is torque. Jobs should be submitted from any pplxint node. If your jobs run well here you will probably like to use the grid cluster here at Oxford.
Getting Started
The manual for Open PBS may also be useful for advanced users.
The basic commands are qsub and
qstat, syntax for
these can be checked from the man pages.
Your job needs to be started by a small script that sets any environment variables first and then runs the program, for example a script file myjob
#!/bin/sh sleep 30 echo Job Done
Could be submitted with the command qsub myjob.
Progress can be checked by typing qstat -ans
The systems can see all the usual data disks.
If your job is likely to be i/o intensive it may be better to copy data sets on to a local scratch area, and work on them from there rather than directly over the network to the data disks.
Each worker node has a local scratch disk (200GB) which is mapped on to the environment variable TMPDIR for each job. This can be used while the job runs but all contents are deleted when the job completes so results stored here must be copied to either home or data disks at the end of the job.
For example a job script could start by copying the program and data to the scratch dir:
#PBS -l nodes=1 sleep 15 cd $TMPDIR cp /userdisk/gronbech/myprogram . cp /data/zeus/gronbech/mydata . ./myprogram cp ./myresults /data/zeus/gronbech echo Job Done
After completion two files will be left in your login directory myjob.exx and myjob.oxx (where xx is the pbs job number) which include the standard error and standard output from the jobs. Problems getting the output back can be caused by your .login/.cshrc files trying to output to the screen, this breaks the rcp/scp operation used to copy your results back. You can avoid this by adding the following construct in your .login file
if ( ! $?PBS_ENVIRONMENT ) then echo " " echo "Starting .login file" echo " " source /etc/group.login source cdf_local.csh endif
What this does is skip past your normal setup when running as a PBS job an alternative is to check for an interactive shell by looking at the prompt and exiting if there isn't one. This could be added to your .cshrc file for example
if (!($?prompt)) exit
Killing / Pausing a Job
To delete a job use qdel nnn where nnn is the number returned from
the qstat command can be used to delete a job (either running or
in the queue ).
If you have several jobs queuing and you feel it might be fair to let some other
peoples jobs get in before you, you can hold jobs in the queue with the command
qhold nnn and then release it later with the command
qrls nnn
Other q commands may be useful such as qalter to change the requirements of an already submitted but still queueing job.
Scheduling on the Cluster
The cluster uses the MAUI scheduler which has a fair share mechanism which favours users who have not run recently, so in principle if the system is busy with user A's jobs and user B submits jobs they should run as soon as a node becomes free.
Controlling which queue your job runs on:
-
qstat -Q -fwill print out details of all the queues. -
qstat -Qshows the total number of jobs running on all queues. -
qstat -B pplxtorque02 pplxtorquegives a summary of the status of pplxgen and the cdf systems. -
qstat -ans @pplxtorque02 gives details of jobs running on the SL5 cluster. -
pbsnodes -agives a list of all the worker nodes and lists the jobs running there. Nodes not running jobs should be listed as free.
The default queue ( input queue) will route jobs to shortjobs or normal queues depending upon demanded cpu time. The input queue will assume 168 hours if you do not specify a time requirement, which will send your jobs to the normal queue. CPU time limits are:
- short is up to 12 hours
- normal is between 12 hours and 168 hours (ie a week)
The queues also have different priorities, so jobs in the shorter queues will be submitted before longer jobs when cpu's become available.
You can specify how much time your job requires either in the job script or on the command line. Include the line
#PBS -l cput=11:50:00
in the script to ask for 11 hours 50 minutes. To ask for 100 hours on the command line use
qsub -l cput=100:00:00 myscript
