CoEPP RC
 

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cloud:technical [2014/01/14 11:52]
rwilson [Known problems & workarounds]
cloud:technical [2014/03/20 15:24]
rwilson [Known problems & workarounds]
Line 263: Line 263:
 </​code>​ </​code>​
  
-This idea was found [[http://stackoverflow.com/questions/​3173131/​redirect-copy-of-stdout-to-log-file-from-within-bash-script-itself|here]].+This idea was found  
 +[[https://www.google.com.au/?​gfe_rd=ctrl&​ei=WmwqU7bhNOzC8gfjn4G4Cw&​gws_rd=cr#​q=redirect+COPY+of+stdout+to+log+file|on the 'net]].
  
 === Pre-userdata text === === Pre-userdata text ===
Line 621: Line 622:
 The known problems (14 Jan 2014) are listed below. The known problems (14 Jan 2014) are listed below.
  
-=== dynamictorque '​runaway'​ === 
  
-For the last month or so, a '​runaway'​ behaviour has been noticed in dynamictorque (DT). 
-This is shown as a constant desire by DT to increase the number of dynamic cores in use 
-even though there appears to be no need for extra cores. 
- 
-This graph shows the problem on 8 January, 2014: 
- 
-{{cloud:​core_usage_20140108.png}} 
- 
-Note that after 0900 the number of dynamic cores keeps increasing, even though there are 
-no queued jobs and no batch jobs running. 
- 
-This may have been noted previously as the reluctance to release unused cores. ​ It may have been that 
-DT was unsuccessfully trying to increase the number of cores because it was asking for N core VMs and 
-OpenStack was unable to supply that many cores in one VM.  This state is defined by admin.py showing a 
-non-zero number of cores being deleted and at the same time trying to start more cores. ​ This is the 
-state after about 1300 hours in the above graph. 
- 
-This behaviour of trying to increase the number of cores when not required and a reluctance to release 
-unneeded cores can be seen in the graphs for Jan 2014 12/11/10 and Dec 2013 31/​23/​19/​14/​11/​9/​3. 
- 
-This '​runaway'​ state appears to be triggered by activity in the batch system as days with no activity 
-can pass without DT trying to increase the number of cores. 
 ===== “cannot find user” errors ===== ===== “cannot find user” errors =====
  
cloud/technical.txt · Last modified: 2014/03/20 15:24 by rwilson
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki