I spent some time yesterday checking over the results from a survey carried out to gather background information for a piece of work UCISA have commissioned on the cost of IT downtime. The project is aiming to establish formulae to allow institutions to estimate the cost to the business of IT failure. The study is being based on a number of scenarios such as loss of systems during clearing, loss of the VLE, etc.
Around two thirds of the respondents had suffered failures in the last five years either loss of a single core system for more than 24 hours or loss of multiple core systems. There were a number of different causes of the failures. Some losses were as a result of external events such as fire or flood but a significant proportion were due to loss of power. This suggests that there is merit in considering outsourcing data centre provision to a managed service with guaranteed power. The other main reasons for loss of core systems were system failure and human/programming error. These sorts of failures generally led to the loss of single core systems for more than 24 hours. There are actions that can mitigate these failures and no doubt investigations after the incidents identified lessons to be learned.
Hopefully the study will allow institutions to put a cost on potential failures. This should then allow judgement to be made on whether it makes business sense to build resilience into a given system or network, or whether to accept the risk of failure that, whilst embarrassing, will not unduly damage or cost the business.