CoEPP RC
 

Torque and Cloud Scheduler Installation and Configuration Guide

This is an installation guide for Torque 2.5.7 and Cloud Scheduler 1.4 tested on a Scientific Linux 6.4 machine in OpenStack cloud platform.

Prepare a VM for Condor central manager

Launch a VM

  • Image: NeCTAR Scientific Linux 6.4 x86_64
  • Name: select one of your choice
  • Keypairs: select one of your choice
  • Flavour: m1.small (4GB memory, 1 core CPU, 40GB ephemeral disk)
  • Security group: torque
ALLOW 22:22 from 0.0.0.0/0
ALLOW 15001:15004 from 0.0.0.0/0
ALLOW 15001:15004 from 0.0.0.0/0
ALLOW 1:65535 from torque

Set up Firewall

  • Run the following commands to set up firewall for Condor central server:
    $ chkconfig --list | grep iptables
    $ chkconfig iptables on
    $ vi /etc/sysconfig/iptables
    # Firewall configuration written by system-config-securitylevel
    # Manual customization of this file is not recommended.
    *filter
    :INPUT ACCEPT [0:0]
    :FORWARD ACCEPT [0:0]
    :OUTPUT ACCEPT [0:0]
    :RH-Firewall-1-INPUT - [0:0]
    -A INPUT -j RH-Firewall-1-INPUT
    -A FORWARD -j RH-Firewall-1-INPUT
    -A RH-Firewall-1-INPUT -i lo -j ACCEPT
    -A RH-Firewall-1-INPUT -p icmp --icmp-type any -j ACCEPT
    -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
    -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
    -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 15001:15004 -j ACCEPT
    -A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 15001:15004 -j ACCEPT
    -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited
    COMMIT
    $ service iptables restart
    $ /etc/init.d/iptables status

Fix Hostname

  • It's required to manually fix hostname of Condor server deployed on the Nectar cloud due to the misconfigured networking and metadata on the cloud.
  • Install nslookup which is not included in SL6.4 image by default (nslookup is part of the bind-utils package):
    $ yum -y install bind-utils
  • Run the following commands to fix hostname settings:
    $ EC2_METADATA=169.254.169.254
    $ IP_ADDRESS=`curl -m 10 -s http://$EC2_METADATA/latest/meta-data/local-ipv4`
    $ EXTHOSTNAME=`nslookup $IP_ADDRESS | grep 'name =' | awk '{print $4}'`
    $ EXTHOSTNAME=${EXTHOSTNAME%?}
    $ echo $IP_ADDRESS $EXTHOSTNAME >> /etc/hosts
    $ sed -i "s/^HOSTNAME=.*$/HOSTNAME=$EXTHOSTNAME/" /etc/sysconfig/network
    $ hostname $EXTHOSTNAME

Torque

Enable the EPEL Repository

  • Install EPEL repo:
    $ rpm -Uvh http://mirror.aarnet.edu.au/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

Enable the MAUI Repository

  • add file /etc/yum.repos.d/maui.repo
[UMD_3_base_SL6]
name=UMD 3 base SL6
baseurl=http://repository.egi.eu/sw/production/umd/3/sl6/$basearch/base
skip_if_unavailable=1
enabled=1
sslverify=0
gpgcheck=0

Install Torque server, munge, maui

  • Run the following command to install (this would take a while):
    $ yum -y install torque-server torque-client maui-server maui-client munge

Configure munge

  • put 1024 random characters into /etc/munge/munge.key (this file's mod should be 600 and it should be owned by the munge user)
-rw------- 1 munge munge 1024 May  1 16:04 /etc/munge/munge.key

start munge

  service munge start

Configure torque server

  • modify /etc/torque/server_name (or /var/lib/torque/server_name) to be the real DNS hostname
  • Start pbs_server
  service pbs_server start
  • use qmgr to initiate the queue
create queue cloud
set queue cloud queue_type = Execution
set queue cloud resources_max.walltime = 24:00:00
set queue cloud resources_default.cput = 01:00:00
set queue cloud enabled = True
set queue cloud started = True

Configure Maui

  • modify /var/spool/maui/maui.cfg
SERVERHOST        <DNS_HOSTNAME_OF_THE_VM>
ADMIN1            root
ADMINHOST        <DNS_HOSTNAME_OF_THE_VM>
RMTYPE[0]           PBS
RMHOST[0]        <DNS_HOSTNAME_OF_THE_VM>
RMSERVER[0]         <DNS_HOSTNAME_OF_THE_VM>

RMPOLLINTERVAL        00:00:30
DEFERCOUNT          10
DEFERTIME          00:10:00

ENABLEMULTIREQJOBS TRUE
JOBNODEMATCHPOLICY EXACTNODE

Start Maui

  service maui start

Install dependencies

  yum install gcc gdbm-devel readline-devel ncurses-devel zlib-devel \
    bzip2-devel sqlite-devel db4-devel openssl-devel tk-devel \
    bluez-libs-devel libxslt libxslt-devel libxml2-devel libxml2
  yum install python-devel
  yum install python-pip
  python-pip install suds
  python-pip install lxml
  python-pip install boto
  

Get the source code

  cd /opt
  git clone -b dev-shunde https://github.com/CoEPP/cloud-scheduler.git
  cd cloud-scheduler/
  
  # set PYTHONPATH
  export PYTHONPATH=/opt/cloud-scheduler/
  

enable stdout in cloud_scheduler.conf

  log_stdout: true

Then run cloud scheduler on the commandline

  python cloud_scheduler -f cloud_scheduler.conf 
cloud/install_torque_cs1.4_sl6.4.txt · Last modified: 2013/11/29 16:43 by rwilson
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki