CoEPP RC
 

An Introduction to Combo

This is an introduction to using combo.

combo is a system that takes a template file and a parameter file. A template file is a simple text file containing any text you want with references to combo variables in its body. A parameter file is also a text file, but contains definitions of combo variables and the values those variables may take.

When combo is run it reads the parameters file, notes all the variables and values they may take, and generates all possible combinations of those values. It then generates one output file for each combination, replacing the variables in the template text with the appropriate values from the combination.

The generated files produced by combo could be anything - PBS batch scripts or C/C++ source code that you compile and execute. You can run combo on your desktop machine, on an interactive node or even on a batch worker node. The limit is your imagination!

Installation

combo is a single executable file, so you can put it into the directory where you want to execute it or you could put it into your PATH somewhere.

If combo becomes popular enough it will be installed into the system software on interactive and worker nodes.

Usage

$ ./combo -h

A program to generate text files for all combinations of values defined in an
input template text file and a parameters file.

Usage:  combo [<options>] <template> <parameters>

where <options>     may be:
                      -a           start sequence numbers after last used
                                   (default is 0)
                      -d <prefix>  combo doc filename starts with <prefix>
                                   (default is <template>)
                      -f           force, even if unsatisfied variables
                      -g <prefix>  generated files start with <prefix>
                                   (default is <template>)
                      -h           print this help and stop
                      -n <num>     start sequence numbers at <num>
                      -o <dir>     sets the base output directory
                      -q <queue>   submit jobs to <queue>
                                   (assumes -s option)
                      -s           submit generated files with 'qsub'
                                   (you must be logged into cxin01/02)
                      -V           print version and stop
                      -v           print debug information
                      -w <num>     set width of file 'sequence' number
                                   (default is 4)
      <template>    is the input template file that contains variable
                    substitutions like '<#name>'
      <parameters>  is a file containing information about possible values for
                    variables in <script>, ie lines like 'name = (5, 10, 1)'.
                    a line like 'beta = (1,100,(1,2,5,10))' is also valid and
                    specifies a logarithmic sequence of values where the (1,2,5,10)
                    tuple specifies values in a decade.

The output is a set of files in the output directory and a documentation file
that contains all the details on the generation process.

The parameter file

A parameter file is a simple text file that contains definitions of combo variables and the values they may take. An example file is:

# An example template file

alpha = (1, 4, 1)          # values are (start, stop, step)

beta = (0.0, 5.e-1, 25e-2) # a comment
  Gamma = (-2, 2, 1)

Note that comments are started with a '#' character and extend to the end of the line and blank lines are ignored.

The example above defines three variables (alpha, beta and gamma). The values taken by each variable are set by a 3-tuple of the form: (<start>, <stop>, <step>) The start, stop and step values may be integer or floating point.

In the example above, alpha may take values 1, 2, 3 and 4. beta may take values 0.0, 0.25 and 0.50, and gamma takes values -2, -1, 0, 1 and 2.

Python uses IEEE-754 floating point representation with 53 bits of precision. If you use very small or very large numbers please check that your numbers are represented correctly in the generated files. If required, combo could be made to use a multiple-precision library.

Log ranges in the parameter file

It is possible to specify logarithmic parameter ranges in the parameter file. Just replace a scalar step value by a tuple specifiying a decade:

beta = (2.0e-2, 1.0e+1, (1, 2, 5, 10))

The decade tuple must start with 1 and end with 10. The initial start value must match one of the decade values (modulo a power of 10). The generated values following the start value will step through the decade values until the stop value is exceeded.

For instance, the above specification for beta would sweep the beta variable through these values:

2.0e-2, 5.0e-2, 1.0e-1, 2.0e-1, 5.0e-1, 1.0, 2.0, 5.0. 10.0

The following parameter values would give the shown value ranges:

parameter range
(1, 100, (1,2,5,10)) 1, 2, 5, 10, 20, 50, 100
(1, 100, (1,3,10)) 1, 3, 10, 30, 100
(1, 1000, (1,10)) 1, 10, 100, 1000

You may mix linear and logarithmic parameters in your parameter file.

The template file

A template file is a simple text file containing anything you want. This could be a C/C++ file that will be compiled or a batch job script you will submit. combo variables in the body of the template file are replaced with a value for that variable. For instance, if we have the following template file:

A simple template file.
The variable alpha has value <#alpha>
and beta=<#BETA>

we see that there are two combo variables, <#alpha> and <#BETA>. Note that the case of the variable name doesn't matter and any surrounding whitespace inside the angle brackets is ignored. That means the following variables are equivalent:

<#alpha>
<#  alpha  >
<#  Alpha>
<#ALPHA  >

When we run combo on the template and parameter examples above we get 12 generated files with a filename made up of the template filename with a number suffix:

$ ./combo -o xyzzy template.example param.example
Output files written to directory xyzzy
12 files generated, template.example.0 to template.example.11
$ ls -l xyzzy
total 52
-rw-rw-r-- 1 smith smith  68 Nov 18 10:17 template.example.0
-rw-rw-r-- 1 smith smith  68 Nov 18 10:17 template.example.1
-rw-rw-r-- 1 smith smith  68 Nov 18 10:17 template.example.10
-rw-rw-r-- 1 smith smith  68 Nov 18 10:17 template.example.11
-rw-rw-r-- 1 smith smith  68 Nov 18 10:17 template.example.2
-rw-rw-r-- 1 smith smith 783 Nov 18 10:17 template.example_2013-11-17T23:47:43.433154
-rw-rw-r-- 1 smith smith  68 Nov 18 10:17 template.example.3
-rw-rw-r-- 1 smith smith  69 Nov 18 10:17 template.example.4
-rw-rw-r-- 1 smith smith  69 Nov 18 10:17 template.example.5
-rw-rw-r-- 1 smith smith  69 Nov 18 10:17 template.example.6
-rw-rw-r-- 1 smith smith  69 Nov 18 10:17 template.example.7
-rw-rw-r-- 1 smith smith  68 Nov 18 10:17 template.example.8
-rw-rw-r-- 1 smith smith  68 Nov 18 10:17 template.example.9

we also see a file with datetime suffix which we will talk about later.

If we look inside some of the generated files, we see:

xyzzy/template.example.0:

A simple template file.
The variable alpha has value 1
and beta=0.0

xyzzy/template.example.1:

A simple template file.
The variable alpha has value 2
and beta=0.0

xyzzy/template.example.2:

A simple template file.
The variable alpha has value 3
and beta=0.0

xyzzy/template.example.3:

A simple template file.
The variable alpha has value 4
and beta=0.0

xyzzy/template.example.4:

A simple template file.
The variable alpha has value 1
and beta=0.25

xyzzy/template.example.5:

A simple template file.
The variable alpha has value 2
and beta=0.25

Note that the alpha variable cycled through all its values 1, 2, 3 & 4 while beta was 0.0. The alpha starts to cycle again with beta at 0.25. The variable gamma, while in the parameter file, was not in the template file so does not appear in the output.

The variable combinations are generated with the first varying most quickly. The order of the variables is defined by the order they are defined in your parameter file. The first-defined varies most quickly.

The _id_ variable

combo provides one special variable for you to use. Notice in the directory listing of generated files above that the filenames have a numeric suffix starting at '.0'. The number suffixed to each file is a unique ID number. This number is available to you as a 'predefined' variable _id_.

Now if we change our template file to:

A simple template file, job <#_id_>.
The variable alpha has value <#alpha>
and beta=<#BETA>

then combo will generate output files that look like:

xyzzy/template.example.0:

A simple template file, job 0.
The variable alpha has value 1
and beta=0.0

xyzzy/template.example.1:

A simple template file, job 1.
The variable alpha has value 2
and beta=0.0

xyzzy/template.example.2:

A simple template file, job 2.
The variable alpha has value 3
and beta=0.0

xyzzy/template.example.3:

A simple template file, job 3.
The variable alpha has value 4
and beta=0.0

xyzzy/template.example.4:

A simple template file, job 4.
The variable alpha has value 1
and beta=0.25

xyzzy/template.example.5:

A simple template file, job 5.
The variable alpha has value 2
and beta=0.25

Note that the _id_ variable isn't part of the combinatorial process: it's just a sequential number matching the output file suffix.

One possible use of the variable is to set a unique name in a PBS job script. For example, you might have these lines in a template file:

# Set the name of this batch job
#PBS -N my_job_<#_ID_>

If you generate 2000 PBS job scripts, each will have a unique PBS name.

The documentation file

In the directory listing showing generated files, we saw a file with a datetime suffix:

$ ls -l xyzzy
total 52
...
-rw-rw-r-- 1 smith smith 783 Nov 18 10:17 template.example_2013-11-17T23:47:43.433154
...

This is a special file created by combo to document the process that generated the files. It contains the date and time that files in the directory were generated and the name and contents of both the template and parameter files.

The documentation file contains something like this:

The template.example.(0-11) scripts here were created on 2013-11-17T23:47:43.433154
from a parameter file '/home/smith/git_repos/cxcode/combo/param.example':
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# An example template file

alpha = (1, 4, 1)          # values are (start, stop, step)

beta = (0.0, 5.e-1, 25e-2) # a comment
  Gamma = (-2, 2, 1)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

and a template file '/home/smith/git_repos/cxcode/combo/template.example':
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A simple template file.
The variable alpha has value <#alpha>
and beta=<#BETA>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Any suggestions that improve the contents of this file are gladly received.

Submitting generated batch files

If you have generated a large number of PBS batch files, the question may arise: How do I submit all these files to the batch system?

Let's assume we are in the directory containing the files you want to submit:

$ ls -l
total 52
-rw-rw-r-- 1 smith smith  68 Nov 18 10:32 template.example.0
-rw-rw-r-- 1 smith smith  68 Nov 18 10:32 template.example.1
-rw-rw-r-- 1 smith smith  68 Nov 18 10:32 template.example.10
-rw-rw-r-- 1 smith smith  68 Nov 18 10:32 template.example.11
-rw-rw-r-- 1 smith smith  68 Nov 18 10:32 template.example.2
-rw-rw-r-- 1 smith smith 783 Nov 18 10:32 template.example_2013-11-18T00:02:16.275841
-rw-rw-r-- 1 smith smith  68 Nov 18 10:32 template.example.3
-rw-rw-r-- 1 smith smith  69 Nov 18 10:32 template.example.4
-rw-rw-r-- 1 smith smith  69 Nov 18 10:32 template.example.5
-rw-rw-r-- 1 smith smith  69 Nov 18 10:32 template.example.6
-rw-rw-r-- 1 smith smith  69 Nov 18 10:32 template.example.7
-rw-rw-r-- 1 smith smith  68 Nov 18 10:32 template.example.8
-rw-rw-r-- 1 smith smith  68 Nov 18 10:32 template.example.9

We can use this bit of shell to submit all the generated files:

$ for F in template.example.*; do
> qsub $F
> done
$

If you want to submit the generated files immediately, you can do:

$ ./combo -o xyzzy -s job.template param.example
Output files written to directory xyzzy
12 files generated and submitted, job.template.0 to job.template.11

Note that this will only submit the generated files. Any other previously-generated files will not be submitted.

The job.template file contains:

#!/bin/bash

#PBS -N combo_test
#PBS -j oe
#PBS -l walltime=0:01:00

cd /data/smith/combo
echo "alpha=<#alpha>" > job.output_<#_id_>
echo "beta=<#beta>" >> job.output_<#_id_>
echo "_id_=<#_id_>" >> job.output_<#_id_>
echo "Job <#_id_> done!"

The param.example file is the same as above.

Submitting to a particular queue

Finally, you can specify which batch queue you want jobs to be submitted to. The -q <queue> option allows you to specify the queue. If you use the -q option the -s option is assumed.

Note that submitting to the long queue (for instance) doesn't set the requested wall or CPU times for your job. Forcing a job to run from the long queue will give your job a minimum default wall and CPU time limit, which may not be enough.

If your batch jobs have a long run time it is always best to specify how much time those jobs require.

cloud/combo.txt · Last modified: 2014/02/05 10:29 by lucien
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki