CoEPP RC
 

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cloud:a_walk_in_the_cloud [2015/10/01 13:33]
goncalo [Batch Jobs]
cloud:a_walk_in_the_cloud [2018/10/09 16:44]
scrosby
Line 15: Line 15:
  
 To use a cloud batch system you login to the "​interactive nodes"​. To use a cloud batch system you login to the "​interactive nodes"​.
-For the NeCTAR tier 3 cloud the nodes are //cxin01//, //cxin02// and //cxsin//.+For the NeCTAR tier 3 cloud the nodes are //cxin01//, //cxin02//, //cxin03// and //cxin04//.
 You login this way: You login this way:
 <​code>​ <​code>​
 ssh -Y <​user_name>​@cxin01.cloud.coepp.org.au ssh -Y <​user_name>​@cxin01.cloud.coepp.org.au
 ssh -Y <​user_name>​@cxin02.cloud.coepp.org.au ssh -Y <​user_name>​@cxin02.cloud.coepp.org.au
 +ssh -Y <​user_name>​@cxin03.cloud.coepp.org.au
 +ssh -Y <​user_name>​@cxin04.cloud.coepp.org.au
 </​code>​ </​code>​
-The above nodes are at Melbourne.  ​To use the Adelaide interactive node do: +The above nodes are at Melbourne.  ​
-<​code>​ +
-ssh -Y <​user_name>​@cxsin.cloud.coepp.org.au +
-</​code>​+
  
 The interactive nodes are used to submit jobs to the cloud and also for interactive use. The interactive nodes are used to submit jobs to the cloud and also for interactive use.
Line 33: Line 32:
  
 When your job runs in the cloud it does __not__ have access to your home directory. When your job runs in the cloud it does __not__ have access to your home directory.
-A directory **/​data/<​user_name>​** is available to you on both the interactive nodes and all the batch worker nodes. +A directory **/​data/<​user_name>​** is available to you on both the interactive nodes and all the batch worker nodes. ​You are recommended to now use CephFS **/​coepp/​cephfs** 
-You must place your executable files and any input data required under your **/​data/<​user_name>​** directory before +You must place your executable files and any input data required under your **/​data/<​user_name>​** or **/​coepp/​cephfs** directory before 
-submitting a batch job.  Similarly, any output files written by your batch job will be under your **/​data/<​user_name>​** directory.+submitting a batch job.  Similarly, any output files written by your batch job will be under your **/​data/<​user_name>​** or **/​coepp/​cephfs** directory.
  
 ====== Software Management ====== ====== Software Management ======
Line 116: Line 115:
 There are two batch queues in the current system: **short** and **long**. ​ Each queue has limits There are two batch queues in the current system: **short** and **long**. ​ Each queue has limits
 on CPU time and wall clock time (walltime). on CPU time and wall clock time (walltime).
-The default walltime ​and CPU time limits for the batch queues are: +The default walltime limits for the batch queues are: 
-^  queue      ^  walltime ​                ^  CPU time                 ^ +^  queue      ^  walltime ​                ^ 
-|  short      |  maximum 1 hour           ​|  ​maximum 1 hour           | +|  short      |  maximum 1 hour           ​| ​  
-|  long       ​|  ​default ​maximum ​96 hours |  ​default maximum 5 hours  | +|  long       ​| ​ maximum ​7 days |  ​ 
-If your batch jobs exceed either ​time limit they will be terminated.+|  extralong ​ ​| ​ ​maximum 31 days |  ​ 
 +If your batch jobs exceed either ​the walltime ​limit they will be terminated.
  
 You can specify which queue to run on when you submit your job: You can specify which queue to run on when you submit your job:
Line 130: Line 130:
 That didn't matter for our little //fib// example as the required CPU time is very short. That didn't matter for our little //fib// example as the required CPU time is very short.
 But for your longer running jobs you must consider your required times since if But for your longer running jobs you must consider your required times since if
-you don't specify a queue, you run on the **short** queue and get a maximum of one hour of time (CPU or wall). +you don't specify a queue, you run on the **short** queue and get a maximum of one hour of time (wall). 
-If your jobs requires more than one hour of CPU time you should submit to the **long** queue (shown above).+If your jobs requires more than one hour of walltime ​you should submit to the **long** queue (shown above).
  
-If you need more than five hours of CPU time (for instance), there are ways to request extended limits using+If you need more than five hours of walltime ​(for instance), there are ways to request extended limits using
 **batch parameters**. ​ Suppose we had a very inefficient method to compute **batch parameters**. ​ Suppose we had a very inefficient method to compute
 Fibonnaci 30 that is expected to run for ten CPU hours. ​ We would need a batch file like this: Fibonnaci 30 that is expected to run for ten CPU hours. ​ We would need a batch file like this:
 <​code>​ <​code>​
-#PBS -l cput=10:00:00+#PBS -l walltime=10:00:00
 /​data/​smith/​fib 30 /​data/​smith/​fib 30
 </​code>​ </​code>​
Line 173: Line 173:
 ========= Getting Email ========= ========= Getting Email =========
  
-It may be useful to you (especially when starting out) to get an email when your job finishes. +We have disabled ​email from the batch system as it leads to our mail server being blocked for spam
-You do this with these parameters:​ +
-<​code>​ +
-#PBS -m ae +
-#PBS -M <your email address>​ +
-</​code>​ +
- +
-The **ae** part of the first parameter says you want email on an **abort** or a normal **end** of your job. +
-When you have a little experience running jobs you probably want to use **#PBS -m a** which sends you +
-email only on job abort.+
  
 ========= Selecting a Queue ========= ========= Selecting a Queue =========
Line 196: Line 187:
 </​code>​ </​code>​
  
-========= Selecting a Zone ========= 
- 
-The cloud has various zones containing VMs that you may use. 
-These zones are in different geographical locations around Australia. 
-At the beginning of the year 2014 the available zones were: 
-^ Zone ^ Location ^ ID ^ 
-| melbourne_qh2 | Melbourne | **MEL** | 
-| monash_01 | Melbourne | **MON** | 
-| sa | Adelaide | **SA** | 
- 
-You can specify which zone you want your job to run in. 
-Which zone you choose depends on what sort of job you want to run. 
-If your job needs little data access then it can run anywhere. 
-However, if your job needs constant, heavy access to a lot of data, it 
-is better to run your job in a zone that is close to where your data is stored. 
-You specify which zone your job will run in by specifying the appropriate **ID** 
-from the table above in a PBS option line.  For instance, to run a job on  
-a Monash VM, you can do this in your batch file: 
-<​code>​ 
-#PBS -l nodes=1:MON 
-</​code>​ 
- 
-The number immediately after the "​**=**"​ sign sets the number of required CPU cores your job needs. 
-You should always set this to **1**. ​ This wiki page doesn'​t address running multi-core jobs. 
- 
-If you don't specify a zone, the batch system will choose a zone at random. 
 ========= Debugging ========= ========= Debugging =========
  
cloud/a_walk_in_the_cloud.txt · Last modified: 2018/10/09 16:44 by scrosby
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki