CoEPP RC
 

Overview

This document is meant to provide a technical overview of the CoEPP Tier3 cloud, useful as an introduction to new developers.

What you see here is a 'brain dump' with little structure imposed. Others are encouraged to add to what is here and to organise it differently. In particular, large sections of this page should be moved to their own page.

What is it?

The CoEPP Tier3 cloud (the 'cloud') is a set of VMs and servers running on the NeCTAR infrastructure and Melbourne-based servers.

In its current incarnation, the cloud is:

  • 120 worker nodes (NeCTAR VMs)
  • 2 interactive nodes (NeCTAR VMs)
  • 1 submit node (offcloud server)
  • various infrastructure servers (offcloud servers)

Naming Conventions

NeCTAR VMs have two names:

  • the VM hostname
  • the OpenStack dashboard name

Most VMs have a hostname like vm-<ip-address>.rc.melbourne.nectar.org.au. The exceptions are DPM servers which have hostnames like cxdpm*.cloud.coepp.org.au (for some reason).

The dashboard name for a VM is set when the instance is started. This can be almost anything, but the convention is:

  • cxwnNNN - a worker node where NNN is a 3-digit numeric value (eg, cxwn045)
  • cxinNN - an interactive node where NN is a 2-digit numeric value (eq, cxin01)
  • cx????? - for anything else (like the DPM servers)

Note that cloud managers may decide to add extra metadata to the dashboard names to make management easier with herd. For instance, a dashboard name cxwn_mon_000 means this is a worker node VM numbered 000 in the monash-01 cell.

The offcloud servers are:

Name login@hostname comments
c3torque root@c3torque.cloud.coepp.org.au PBS/Torque server
cxbuild root@cxbuild.cloud.coepp.org.au ?
cxcvmfs root@cxcvmfs.cloud.coepp.org.au CVMFS server
cxmon root@cxmon.cloud.coepp.org.au nagios server
cxperf root@cxperf.cloud.coepp.org.au ganglia server
cxpuppetmaster root@cxpuppetmaster.cloud.coepp.org.au puppet master server
cxui root@cxui.cloud.coepp.org.au user submit server (turned off around 1 Oct 2013)

Maintenance

Starting servers

There are two ways to start a server:

  • NeCTAR dashboard
  • herd

Dashboard

Navigate to http://nectar.org.au/, login and go to the CoEPP Tier3 tenancy.

Select the Instances tab and click on the Launch Instance button at the top right of the page. You will see a dialog used to create a new instance:

Here you enter the details for the new VM. We enter values for a test worker node in this dialog and the other tabs:

Note that we enter the userdata script from git: cximage/userdata/c3_userdata.sh.

After you click on the Launch button, you will see the newly created VM in the dashboard:

To terminate the new instance, click on the drop-down menu at the right of the test001 entry and select the Terminate instance item:

You will now see the test001 entry marked for deletion and, after a short wait, the server will disappear:

A video tutorial on running the dashboard is at http://www.youtube.com/watch?v=dPptPWedOyY.

This is not a very useful way to start a VM. True, you can start multiple VMs this way but they will all have the same dashboard name.

herd

herd is a better way to start VMs because it can start multiple VMs, each with different names, and is more flexible in its handling of the VM userdata script. Plus, herd is scriptable.

The main herd page is here.

Here we show how herd is used to start and stop a single VM. This starts a single VM with attributes set in the wn_config configuration file, overridden by the -p and -u options, which set the VM name prefix and userdata source respectively.

herd start -c wn_config -p test -u cx_userdata.sh 1

The program waits until the started VMs are able to accept an SSH connection.

Once the VM is running you can stop it with:

herd stop -p test

A more complete example of using herd to tear down an existing cloud and restarting all worker and interactive nodes is here.

git repositories

There are a few git repositories in use. One holds all the puppet files and manifests that are served from cxpuppetmaster. Another holds all other code used to manage and test the cloud and so on.

repository clone
cxcode git clone ssh://git@rcgit.atlas.unimelb.edu.au/cxcode
cxpuppetmaster git clone ssh://git@rcgit.atlas.unimelb.edu.au/cxpuppetmaster
cximage git clone ssh://git@rcgit.atlas.unimelb.edu.au/cximage
cxpuppetadmin git clone ssh://git@rcgit.atlas.unimelb.edu.au:cxpuppetadmin
puppet git clone ssh://git@rcgit.atlas.unimelb.edu.au/puppet
puppetmaster git clone ssh://git@rcgit.atlas.unimelb.edu.au/puppetmaster

You can put the repositories wherever you want in your filesystem, but remember that herd/puppet_config.py reads a configuration file that contains the path to your cxpuppetmaster repository. Some people have put their repositories under ~/git_repos and the git copies of the config file reflects this.

Note that there is a periodic (5 minutes?) automatic update from the main git cxpuppetmaster repository to the cxpuppetmaster server. Any change you push to the main cxpuppetmaster repository will eventually appear on the cxpuppetmaster server. You can trigger this update by doing:

# /etc/puppet/scripts/update_puppet_directory

on cxpuppetmaster.

Logging into VMs and offcloud servers

You can login to any VM which has started SSH by getting the IP address of the target VM and doing:

ssh -i ~/.ssh/nectarkey.pem root@<IP>

where <IP> is the IP address of the target VM and the key needed to login is given with the -i option. The key you use to connect to the VM is the key given when the VM was started. You can get the name of the key you need from the NeCTAR dashboard by drilling down from http://www.nectar.org.au.

If you have herd installed, you could do this to log into any on-cloud VM:

herd ssh cxwn_sl6_melqh2_010

Logging in to the on- and off-cloud interactive and submission nodes is similar except that authentication must be supplied by keys you generate and that have been placed into the appropriate machines:

cxin00 ssh <user>@cxin00.cloud.coepp.org.au Note: SL5
cxin01 ssh <user>@cxin01.cloud.coepp.org.au
cxin02 ssh <user>@cxin02.cloud.coepp.org.au
cxui ssh <user>@cxui.cloud.coepp.org.au Note: turned off around 1 Oct 2013

where <user> is your username matching the credentials that were copied to the machines.

There are various other off-cloud machines that you can access:

cxbuild ssh root@cxbuild.cloud.coepp.org.au
cxcvmfs ssh root@cxcvmfs.cloud.coepp.org.au
cxmon ssh root@cxmon.cloud.coepp.org.au
cxperf ssh root@cxperf.cloud.coepp.org.au
cxpuppetmaster ssh root@cxpuppetmaster.cloud.coepp.org.au

And last, there are usually various ephemeral machines you may use, including (at Aug 2013):

tg2 ssh root@tg2.tev.unimelb.edu.au
tg3 ssh root@tg3.tev.unimelb.edu.au
tg4 ssh root@tg4.tev.unimelb.edu.au
tg5 ssh root@tg5.tev.unimelb.edu.au
tg6 ssh root@tg6.tev.unimelb.edu.au

puppet

Puppet is used to manage the configuration of all on-cloud VMs and many off-cloud machines.

The git repository that holds all puppet code served by the cxpuppetmaster server is called cxpuppetmaster. This puppet code manages worker node VMs and the interactive node VMs as well as many off-cloud machines. We use the shorthand WN and IN for the two cloud node types.

Puppet manages everything on the VMs. Before puppet can run we need to configure a few things in the VM. We don't want to build this stuff into an image, so we set it up in a userdata script.

Userdata scripts

We made a decision that the images we start VMs from were to be as vanilla as possible, containing only the required software for subsequent phases of installation to succeed. We added code that will execute any supplied userdata string.

So the basic approach to bringing up a VM is:

  • start a VM from the basic image
  • supply that VM with a userdata script that installs the least software that will allow puppet to run
  • puppet installs and configures everything else

So our userdata scripts are a sort of bootstrap.

Since we occasionally reboot a VM there is an additional requirement: some operations performed in the userdata script must be performed during every boot and some operations should be performed on the first boot only.

More details on how userdata script are used is here.

Do something on first boot only

The basic idea to ensure that something executes only on the first boot is similar to the idea of a lock file. Before executing a bit of “run once” code we check if a special file exists in the filesystem somewhere. If it doesn't, we run the code. The last thing the bit of protected code does is to create the lock file.

Since we wanted some flexibility we decided to have a lock file for every small discrete operation in the userdata script. However this results in lots of lock files - not nice. So we decided that the actual lock mechanism was the existence of a line of text in a single lock file.

An example may make things clearer. We created a helper bash function:

LOCKDIR=/root
LOCKFILE=$LOCKDIR/userdata_lock_sections
mkdir -p $LOCKDIR

DELIM1="############################################################################"
DELIM2="----------------------------------------------------------------------------"

# helper function to decide if a section should be run or not
function not_run()
{
    SECTION_KEY="$1"

    if grep "^${SECTION_KEY}$" $LOCKFILE >/dev/null 2>&1; then
        echo "$DELIM2 Already run: $SECTION_KEY"
        return 1
    else
        echo "$DELIM1 $SECTION_KEY"
        echo "$SECTION_KEY" >> $LOCKFILE
        return 0
    fi
}

We use the function this way:

KEY="set system clock"
if not_run "$KEY"; then
    yum install ntp
    /usr/sbin/ntpdate au.pool.ntp.org
fi

The only 'gotcha' is that the script creator must ensure that the KEY strings are unique.

Do something on every boot

This is trivial. Just place code in the userdata script and don't protect it with the not_run code:

echo "$DELIM1 setting txnqueuelen and readahead buffers"
ifconfig eth0 txqueuelen 10000
blockdev --setra 8096 /dev/vda
blockdev --setra 8096 /dev/vdb

Userdata logging

This code snippet was found handy so is presented here as an idea.

While debugging userdata scripts one difficulty was that there was no easily accessible log of what actually happened during script execution. It would have been possible to write logging functions and use those to generate some persistent output somewhere, but a better way was found.

If you place this code at the top of a userdata script then all stdout and stderr output after the inserted code is redirected to the specified file:

# tee all output to a log file
LOGDATE=$(date +"%Y%m%d_%H%M%S")
LOGFILE=/tmp/userdata_$LOGDATE.log
exec > >(/usr/bin/tee $LOGFILE)
exec 2>&1

This idea was found on the 'net.

Pre-userdata text

With the introduction of the qld zone and their different VM setup, we needed a way to do something in a userdata script but only if starting in the qld zone. We could have done something like looking at the eth0 IP address and deciding the zone from that, but a more generic solution was chosen.

When using 'herd start' to start a VM, the user can specify optional userdata and extra-userdata files. herd start will combine those files and present them to the VM as a single userdata text.

It was decided that herd start would automatically generate a pre-userdata text defining variables containing the VM zone, etc. This text is prefixed onto the user-specified userdata and extra-userdata text. This means the code in userdata scripts can use the herd variables to decide what to do.

At the moment, the herd start pre-userdata text is something like:

#!/bin/bash
# tee all output to a log file
HERD_LOGDATE=$(date +"%Y%m%d_%H%M%S")
HERD_LOGFILE=/tmp/userdata_$HERD_LOGDATE.log
exec > >(/usr/bin/tee $HERD_LOGFILE)
exec 2>&1

echo "############################################################################"
echo "#                    pre-userdata generated by 'herd start'                #"
echo "############################################################################"
echo "Pre-defined HERD variables:"
echo "HERD_LOGFILE=$HERD_LOGFILE"
HERD_KEYPAIR="nectarkey"
echo "HERD_KEYPAIR=$HERD_KEYPAIR"
HERD_ZONE="qld"
echo "HERD_ZONE=$HERD_ZONE"
echo "############################################################################"
echo "#                            end of pre-userdata                           #"
echo "############################################################################"

This means the generic userdata text in c3_userdata.sh can now set up eth1 if running in the qld zone:

# need to configure storage IP as eth1 for QLD only
if [ "$HERD_ZONE" == "qld" ]; then
    KEY="configuring eth1"
    if not_run "$KEY"; then
        /sbin/dhclient eth1
    fi
fi

This pre-userdata text insertion can be turned off by using the -g option to herd start. Note that inhibiting the insertion means the stdout+stderr redirection is not done, so we have this in c3_userdata.sh:

# tee all output to a log file if not done in pre-userdata
if [ -z $HERD_LOGFILE ]; then
    # var HERD_LOGFILE not defined, capture output here
    LOGDATE=$(date +"%Y%m%d_%H%M%S")
    LOGFILE=/tmp/userdata_$LOGDATE.log
    exec > >(/usr/bin/tee $LOGFILE)
    exec 2>&1
fi

Puppet and global certificates

Originally we used a certificate for every on-cloud VM. This quickly became unmanageable due to instability in the VMs requiring a new certificate for every new VM. We didn't want to use automatic signing of certificates, so we decided to create just a few certificates and have all VMs of each type use the same certificate. We call this approach global certificates.

This approach was mixed with another idea about recognizing nodes in puppet. For the purposes of explanation, we will talk about only the global certificate idea here.

The original idea was seen here.

Generating the global certificate

We create a global certificate. First, we must decide on the name for the certificate which we chose along the lines of wn.coepp.org.au. The wn prefix will be explained more fully in the section Puppet and recognizing node types.

Once we have thought about the certificate name we can create our certificate. On the puppetmaster server do:

# puppet cert --generate wn.coepp.org.au

This generates public and private certificates

/var/lib/puppet/ssl/ca/signed/wn.coepp.org.au.pem
/var/lib/puppet/ssl/private_keys/wn.coepp.org.au.pem

Note that the prefix for the paths above can be different, depending on how you installed puppet. You can ask puppet where its ssl directory is by doing

$ puppet agent --configprint ssldir
/var/lib/puppet/ssl

Copy certificates to new VMs

You need to copy the certificates to each VM that is going to use them. We can't use puppet to do this! Our approach was to use the userdata script automatically executed by our image when it starts. This script creates the appropriate certificates in the VM puppet filesystem:

mkdir -p /var/lib/puppet/ssl/certs/    # because puppet hasn't run yet
cat <<EOF > /var/lib/puppet/ssl/certs/wn.coepp.org.au.pem
-----BEGIN CERTIFICATE-----
MIICZDCCAc2gAQIBAgICAkYwDQYJKoZIhvcNAQEFBQAwNzE1MDMGA1UEAwwsUHVw
  <snip>
IbcKQtzFc5XkksTxpuTEQOIvxIR0zXcCStrcNwyu3kOt/A15FwOwwQ==
-----END CERTIFICATE-----
EOF

mkdir -p /var/lib/puppet/ssl/private_keys/    # because puppet hasn't run yet
cat <<EOF > /var/lib/puppet/ssl/private_keys/wn.coepp.org.au.pem
-----BEGIN RSA PRIVATE KEY-----
MIICXAIBAAKBgQC2EzLnlpb8wJB1snBTeU586eDzR7EzkkCqUoTxVV4ce5WhmvYD
  <snip>
qu0mpx6y3c46EcPPx0mwV7gkkMB8ZOCgyeIW/ZwamlW=
-----END RSA PRIVATE KEY-----
EOF

Server configuration

You must change the puppet configuration on the server to allow use of the global certificate. On the puppetmaster server change the contents of /etc/puppet/auth.conf to include:

# allow nodes to retrieve their own catalog
path ~ ^/catalog/([^/]+)$
method find
allow $1
allow wn.coepp.org.au

The original document added lines to other sections in /etc/puppet/puppet.conf but we didn't do this. Also note that at one point the line in auth.conf was misconfigured but puppet still managed VMs correctly!?

You must now restart the server:

# service puppetmaster restart

VM configuration

Finally you change /etc/puppet/puppet.conf on each VM. Again, we did this in the userdata script that every VM automatically runs:

cat << EOF > /etc/puppet/puppet.conf
[main]
    <snip>

[agent]
    certname = wn.coepp.org.au
    <snip>
EOF

Test

Doing

# puppet agent --test

on one of the VMs should succeed. The initial lines of output should show the certificate you expect:

# puppet agent -t
info: Retrieving plugin
info: Loading facts in vm_type
info: Loading facts in vm_type
info: Caching catalog for wn.coepp.org.au          # WORKER NODE CERTIFICATE!

Puppet and git branches

We use git repositories to manage all code including puppet manifests.

Within our puppet repository the master branch is considered production code. We create other branches for development of new releases. For example, for our upcoming use of Scientific Linux 6 we created an sl6 branch within git.

Now the question arises “how do we create SL6 VMs and test them?”. We use the environment configuration parameter of puppet.

As mentioned elsewhere, we use userdata scripts to initially set up just enough in a new VM for puppet to run. This includes setting up the puppet configuration files on the new VM:

cat << EOF > /etc/puppet/puppet.conf
[main]
    server     = cxpuppetmaster.cloud.coepp.org.au
    masterport = 8140
    vardir     = /var/lib/puppet
    logdir     = /var/log/puppet
    rundir     = /var/run/puppet
    ssldir     = \$vardir/ssl
    pluginsync = true

[agent]
    certname    = wn.coepp.org.au        # global certificate for worker nodes
    node_name   = cert                   # recognize agents by certificate
    environment = sl6                    # this is an SL6 node
    classfile   = \$vardir/classes.txt
    localconfig = \$vardir/localconfig
    runinterval = 3600
    report      = true
EOF

Note the environment = sl6 entry above. If the automatic puppet run or an administrator does something like

# puppet agent --test

the VM notifies the server that its environmenr is sl6 (in this case).

On the server we have these lines in /etc/puppet/puppet.conf:

[main]
    <snip>

[agent]
    <snip>

[master]
    templatedir    = $confdir/environments/$environment/templates
    manifest       = $confdir/environments/$environment/manifests/site.pp
    modulepath     = $confdir/environments/$environment/modules
    <snip>

This tells the server that it should use the environment value from the VM to dynamically construct the paths to templatedir, manifest and modulepath.

We have scripts that run periodically and place every git branch into the $confdir/environments/ directory. If the environment names we use match the git branch names, the puppet update for a VM comes from the branch named in the environment value.

Puppet and recognizing node types

One difficulty we had was recognizing different types of VMs and configuring them differently in puppet. Our DNS setup requires that all on-cloud VMs have a hostname of a particular form with no distinction being made for different types of VMs. Not being able to use hostnames required vaguely hackish code in puppet to configure a VM differently depending on some incidental (and possibly ephemeral) difference between VM types, such as the processor count.

We wanted a better way.

It was noticed in the puppet documentation for the node_name configuration statement that cert was allowable as one way for puppet to determine the identity of a VM. The happy idea occurred that if we create a global certificate for each VM type we could recognize different VM types in puppet manifests.

The idea is basically that shown in the Generating the global certificate section with the changes shown below.

Create unique certificate names, one for each VM type. We have two VM types, worker nodes and interactive nodes so we chose appropriate names. On the puppetmaster server do:

# puppet cert --generate wn.coepp.org.au
# puppet cert --generate in.coepp.org.au

Copy the public and private certificates to the appropriate VM type. As before, we use userdata scripts with different scripts for different VM types.

On the puppetmaster server change the contents of /etc/puppet/auth.conf to include:

# allow nodes to retrieve their own catalog
path ~ ^/catalog/([^/]+)$
method find
allow $1
allow wn.coepp.org.au
allow in.coepp.org.au

Restart the server after this change, of course.

You must ensure that a VM of each type has the appropriate line in /etc/puppet/puppet.conf:

[agent]
    certname = wn.coepp.org.au

or

[agent]
    certname = in.coepp.org.au

Now the VMs should be managed by puppet, as before. But if we want to distinguish between VM types we can now do this in puppet manifests:

node /^wn.coepp.org.au$/ inherits cloud_basenode
{
    notify {"Configuring WORKER NODE":}

    <snip>
}

node /^in.coepp.org.au$/ inherits cloud_basenode
{
    notify {"Configuring INTERACTIVE NODE":}

    <snip>
}

Userdata script management

We store the userdata scripts used to start VMs in the cximage git repository. If you look in the base directory of that repository you will see:

r-w@neptune:~/git_repos/cximage$ ls -l
total 12
drwxrwxr-x 2 r-w r-w 4096 Jun 25 17:23 providers
drwxrwxr-x 2 r-w r-w 4096 Aug  2 09:32 templates
drwxrwxr-x 2 r-w r-w 4096 Aug  1 15:25 userdata

The userdata sub-directory contains all the userdata files. Looking in that directory we see:

r-w@neptune:~/git_repos/cximage/userdata$ ls -l
total 32
-rw-rw-r-- 1 r-w r-w 6746 Aug  1 09:22 c3_userdata.sh
-rw-rw-r-- 1 r-w r-w 3898 Aug  1 09:22 in_extra_userdata.sh
-rw-rw-r-- 1 r-w r-w 6489 Aug  1 09:22 sl6_c3_userdata.sh
-rw-rw-r-- 1 r-w r-w 3895 Aug  1 09:22 sl6_in_extra_userdata.sh
-rw-rw-r-- 1 r-w r-w 3895 Aug  1 09:22 sl6_wn_extra_userdata.sh
-rw-rw-r-- 1 r-w r-w 3898 Aug  1 09:22 wn_extra_userdata.sh

This needs some explaining. Since we don't have master and sl6 branches as we do in the cxpuppetmaster repository, we use a prefix to indicate what branch each file is for. Files without a prefix are assumed to be production (master) files.

First we must talk about extra userdata files. When starting a VM with 'herd start' you specify a userdata script to give to the VM and an optional extra userdata script. The extra script is simply joined to the bottom of the userdata script and the resultant combined script is given to the VM as the userdata script.

We do this to separate the code that is specific to a VM type from the code that is not specific to the VM type. For instance, in the standard userdata script we set the VM hostname, configure yum repositories for installs, etc. The sort of things we do for every VM. An extra userdata script contains stuff that is specific to an interactive or worker node. For instance, we configure puppet with the appropriate certificate name and environment, and set up the correct certificates.

So when we start a VM we consider the VM type and environment and supply the following userdata and extra userdata files to the VM:

IN VM WN VM
master c3_userdata.sh + in_extra_userdata.sh c3_userdata.sh + wn_extra_userdata.sh
sl6 sl6_c3_userdata.sh + sl6_in_extra_userdata.sh sl6_c3_userdata.sh + sl6_wn_extra_userdata.sh

Miscellaneous

  • nagios
  • ganglia
  • monthly reports
  • herd

Known problems & workarounds

The known problems (14 Jan 2014) are listed below.

“cannot find user” errors

This problem hasn't been seen for a while.

You may find that batch jobs hang in 'W' state with repeated emails containing cannot find user 'smith' in password file.

The cause of this problem appears to be sssd.

The solution is to clear the sssd cache on the affected VMs and restart sssd:

# /sbin/service sssd stop
# rm -f /var/lib/sss/db/cache_coepp.ldb
# /sbin/service sssd start

Of course, herd may be quicker if more than one VM is affected:

$ herd cmd -p cxwn_mon_011,cxwn_mon_026 "/sbin/service sssd stop; rm -f /var/lib/sss/db/cache_coepp.ldb; /sbin/service sssd start"

The incidence of this problem has decreased (Aug 2013) but still occurs.

cloud/technical.txt · Last modified: 2014/03/20 15:24 by rwilson
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki