CoEPP RC
 

CoEPP Research Computing Tier 3

Adelaide

Login Node aui.coepp.org.au
Usable Cores 20 (24 Total)
Storage 4TB NFS /home
(RDSI)
Memory 4GB/core
More Details

Melbourne

Login Node mui.coepp.org.au
Usable Cores 28 (32 Total)
Storage 60TB NFS /home
(MrHat)
Memory 2GB/core
More Details

Sydney

Login Node sui.coepp.org.au
Usable Cores 48
Storage 48TB NFS /home
(sydpp)
Memory 4GB/core
More Details

Obtaining Access to the Tier 3 computing Resources

In order to access any of the CoEPP nodes' Tier 3 resources you will need to be registered in the CoEPP central authentication service.

You were most likely registered when you first joined CoEPP and got your first.last@coepp.org.au email address, however if you do not remember your coepp username or password please contact the Research Computing team on rc@coepp.org.au or contact any of the individuals listed below:

Adelaide

Since Adelaide University does not assign unix style usernames, your username can be chosen. We request that it be longer than 3 characters and only consist of alphabetic characters.

Please email rc@coepp.org.au and either Sean or Lucien will get in touch

Melbourne

We endeavour to keep your CoEPP username the same as your central unimelb username (which is also the same as your unimelb Physics username). Only in the rare occasion of a username clash with an existing user from another node will this not be possible.

Contact Name Lucien Boland lucien.boland@coepp.org.au 03 8344 7994
Contact Name Sean Crosby sean.crosby@coepp.org.au 03 8344 8093

Sydney

Your CoEPP username and UID (the unique number associated with your username) will be the same as your Physics IT username and UID. We do this to allow you continued NFS access to your home directories from Linux machines controlled by the USyd Physics IT department.

Please email rc@coepp.org.au and either Sean or Lucien will get in touch

Login

Once you have your CoEPP central authentication account you will immediately be able to access the UI of your home node.

Node Login address Actual server name
Adelaide aui.coepp.org.au adlui1.adl.coepp.org.au
Melbourne mui.coepp.org.au melui1.mel.coepp.org.au
Sydney sui.coepp.org.au sydui1.syd.coepp.org.au

Access to other node's UIs can also be request from the Research Computing team and additional resources are available to all CoEPP researchers on the CoEPP Tier 3 cloud facility.

Terminal access can gained using these recommended emulators:

Mac terminal xterm2
Windows Putty
Linux: xterm, gnome-terminal

You must use the Secure SHell protocol as demonstrated below:

user@local$ ssh -X -l lucien mui.coepp.org.au
Last login: Tue Feb 12 06:37:15 2013 from locahost.unimelb.edu.au

+---+       +---+
|   |-------|   |                    Research Computing
+---+   |   +---+
  |   +---+   |   ARC Centre of Excellence for Particle Physics at the Terascale
  |---|   |---|
  |   +---+   |       *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
+---+   |   +---+     *            This is a restricted server.           *
|   |-------|   |     *        Only authorised users are permitted        *
+---+       +---+     *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*


To initialise ATLAS software, type

   setupATLAS

[agu1:~]$

Using PBS

Submit a job

[tjdyce@ui ~]$ qsub test.sh
782.ui.atlas.unimelb.edu.au

Note: the job number is important. Here it is 782.
Note: This submits to the long queue by default, check the queues and priorities section later for other possible queues.

Check the current running jobs

[tjdyce@ui ~]$ qstat
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME
782                   tjdyce     Running     1  2:16:46:30  Wed May 27 22:42:02
783                   fifieldt   Running     1  2:19:47:56  Thu May 28 01:43:28
784                   ulif       Running     1  2:19:50:28  Thu May 28 01:46:00
785                   tshao      Running     1  2:19:54:10  Thu May 28 01:49:42

     4 Active Jobs       4 of    8 Processors Active (50.00%)
                         1 of    1 Nodes Active      (100.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME

0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


Total Jobs: 4   Active Jobs: 4   Idle Jobs: 0   Blocked Jobs: 0

Trace the status of a particular job

[tjdyce@ui ~]$ tracejob 782

Job: 782.ui.atlas.unimelb.edu.au

02/25/2009 06:52:41  S    enqueuing into default, state 1 hop 1
02/25/2009 06:52:41  S    Job Queued at request of tjdyce@ui.atlas.unimelb.edu.au, owner = tjdyce@ui.atlas.unimelb.edu.au, job name = test.sh, queue = default
02/25/2009 06:52:42  S    Job Modified at request of root@ui.atlas.unimelb.edu.au
02/25/2009 06:52:42  S    Job Run at request of root@ui.atlas.unimelb.edu.au
02/25/2009 06:52:42  S    Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:00
02/25/2009 06:52:42  S    dequeuing from default, state COMPLETE

Get the job output

Once the job is complete, the output is stored as below. Note again the job number is used.

[tjdyce@ui ~]$ ls -lah *782
-rw-------  1 tjdyce epp    0 Feb 25 06:52 test.sh.e782
-rw-------  1 tjdyce epp 2.8K Feb 25 06:52 test.sh.o782

Queues and Priorities/Fairshare

Queues

Melbourne Queues

Queue Priority Maximum Simultaneous Jobs Maximum Walltime Usage Submit Command
mel_long Low, jobs will run behind short jobs No limit 96 hours For standard jobs, which you have tested and are ready to set running qsub -q mel_long
mel_short High, jobs take precedence
over long jobs
No limit 1 hour For Prototyping jobs qsub -q mel_short
mel_extralong Low, jobs will run behind short jobs No limit 336 hours For very long running jobs qsub -q mel_extralong

Note: The mel_long queue is the default, if you do not specify a queue this is where jobs will go.

qmgr -c "print server" | grep default_queue
set server default_queue = mel_long

Adelaide Queues

Queue Priority Maximum Simultaneous Jobs Maximum Walltime Usage Submit Command
adl_long Low, jobs will run behind short jobs No limit 96 hours For standard jobs, which you have tested and are ready to set running qsub -q adl_long
adl_short High, jobs take precedence
over long jobs
No limit 2 hours For Prototyping jobs qsub -q adl_short

Note: The adl_long queue is the default, if you do not specify a queue this is where jobs will go.

qmgr -c "print server" | grep default_queue
set server default_queue = adl_long

Sydney Queues

Queue Priority Maximum Simultaneous Jobs Maximum Walltime Usage Submit Command
syd_extralong Low, jobs will run behind short jobs No limit 336 hours For standard jobs, which you have tested and are ready to set running qsub -q syd_extralong
syd_long Low, jobs will run behind short jobs No limit 96 hours For standard jobs, which you have tested and are ready to set running qsub -q syd_long
syd_medium Medium No limit 10 hours qsub -q syd_medium
syd_short High, jobs take precedence
over long jobs
No limit 2 hours For Prototyping jobs qsub -q syd_short

Note: The syd_medium queue is the default, if you do not specify a queue this is where jobs will go.

qmgr -c "print server" | grep default_queue
set server default_queue = syd_medium

Fairshare

The T3 queues are setup with user based fairshare. This means that your jobs will be given lower priority over other users if you have been running more jobs than them. This fairshare is calculated over a 5 day period, with daily windows, and a 50% decay.

Adelaide Users

Home Directories

There are two sets of execution queues shown above, the Melbourne and Adelaide queues.

When your jobs run on a Melbourne queue they have access to the home directory that you had on the old Adelaide cloud. When you run on an Adelaide queue, you have a different (possibly empty) home directory. If you're not sure what that means, run a job like this to show what is in your home directory:

#!/bin/bash

#PBS -S /bin/bash
#PBS -j oe
#PBS -l nodes=1
#PBS -l mem=512MB,vmem=512MB
#PBS -l walltime=00:05:00
#PBS -N test

# show the queue we are running on and home directory contents
echo "PBS_O_QUEUE=$PBS_O_QUEUE"       # show the queue we are running on
echo "$ pwd"
pwd
echo "$ ls -l"
ls -l

Submit the job to a Melbourne and Adelaide queue:

$ qsub test.sh                    # submitted to the Melbourne long queue
270981.t3torque.atlas.unimelb.edu.au
$ qsub -q adl_short test.sh       # submitted to the Adelaide short queue
270982.t3torque.atlas.unimelb.edu.au
$ ls -l
total 12
-rw------- 1 rwilson people 1102 Mar  6 03:32 test.o270981
-rw------- 1 rwilson people  115 Mar  6 03:32 test.o270982
-rw-r--r-- 1 rwilson people  278 Mar  6 03:32 test.sh
$ more test.o270981
PBS_O_QUEUE=long
$ pwd
/imports/home/rwilson
$ ls -l
total 40
drwxr-xr-x 5 rwilson people  155 Mar  6 03:09 checkjobs
-rw-r--r-- 1 rwilson people  342 Dec  3 22:11 cloud_users
drwxrwxr-x 4 rwilson people 4096 Feb 19 04:33 combo
drwxr-xr-x 2 rwilson people 4096 Feb 14 00:43 example
drwxrwxrwx 2 rwilson people 4096 Jan  6 21:54 fib
drwxr-xr-x 2 rwilson people   84 Feb 25 23:40 ioctl
drwx------ 5 rwilson people  105 Jan 14 00:10 jdash
drwxr-xr-x 2 rwilson people   67 Feb 18 04:07 job_test
drwxr-xr-x 2 rwilson people 4096 Jan  9 21:52 jobdash
drwxr-xr-x 2 rwilson people   75 Jan  9 22:59 makejob
-rw-r--r-- 1 rwilson people  788 Sep 18 02:53 martin_cpu.job
drwxr-xr-x 2 rwilson people   76 Feb 25 04:14 noclean
-rw-r--r-- 1 rwilson people  315 Sep 18 01:38 one_hour_cpu.py
-rw-r--r-- 1 rwilson people 1645 Feb 20 23:56 results
drwxr-xr-x 4 rwilson people 4096 Mar  1 01:51 test
drwxr-xr-x 2 rwilson people   94 Feb 14 01:15 test2
drwxr-xr-x 2 rwilson people   50 Feb 20 03:28 test_cloud
drwxr-xr-x 2 rwilson people 4096 Feb 21 04:14 test_stage
drwxr-xr-x 3 rwilson people   26 Feb  3 22:58 workarea
$ more test.o270982
PBS_O_QUEUE=adl_short
$ pwd
/imports/home/rwilson
$ ls -l
total 0
drwxr-xr-x 2 rwilson people 39 Mar  6 03:32 test

Note that your home directory is in the same place in the filesystem in both runs (/imports/home/<username>) but the directory contents are different.

Default queue

As stated above, the default queue used if you don't specify a queue in the job script (or on the command line) is the long queue, which is the Melbourne long queue.

If you want your job to run on an Adelaide queue, specify either the adl_short or adl_long queue in the job script:

#PBS -q adl_short

or on the command line:

$ qsub -q adl_short test.sh
tutorial/tier3.txt · Last modified: 2017/07/05 21:27 by scrosby
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki