This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
cloud:technical [2014/03/20 15:21] rwilson [Userdata scripts] |
cloud:technical [2014/03/20 15:24] (current) rwilson [Known problems & workarounds] |
||
---|---|---|---|
Line 622: | Line 622: | ||
The known problems (14 Jan 2014) are listed below. | The known problems (14 Jan 2014) are listed below. | ||
- | === dynamictorque 'runaway' === | ||
- | For the last month or so, a 'runaway' behaviour has been noticed in dynamictorque (DT). | ||
- | This is shown as a constant desire by DT to increase the number of dynamic cores in use | ||
- | even though there appears to be no need for extra cores. | ||
- | |||
- | This graph shows the problem on 8 January, 2014: | ||
- | |||
- | {{cloud:core_usage_20140108.png}} | ||
- | |||
- | Note that after 0900 the number of dynamic cores keeps increasing, even though there are | ||
- | no queued jobs and no batch jobs running. | ||
- | |||
- | This may have been noted previously as the reluctance to release unused cores. It may have been that | ||
- | DT was unsuccessfully trying to increase the number of cores because it was asking for N core VMs and | ||
- | OpenStack was unable to supply that many cores in one VM. This state is defined by admin.py showing a | ||
- | non-zero number of cores being deleted and at the same time trying to start more cores. This is the | ||
- | state after about 1300 hours in the above graph. | ||
- | |||
- | This behaviour of trying to increase the number of cores when not required and a reluctance to release | ||
- | unneeded cores can be seen in the graphs for Jan 2014 12/11/10 and Dec 2013 31/23/19/14/11/9/3. | ||
- | |||
- | This 'runaway' state appears to be triggered by activity in the batch system as days with no activity | ||
- | can pass without DT trying to increase the number of cores. | ||
===== “cannot find user” errors ===== | ===== “cannot find user” errors ===== | ||