When Things get Hot

 Anybody who works in IT today will know that when important IT Services fail, IT staff feel the heat. When they are working normally you don’t normally get thanked. Let’s face it – when did you last write to the Electricity Company to thank them for such a wonderful job delivering electricity? So, for us this situation to some extent goes with the job, but I’ll use this Blog today to say thanks to a number of RGU IT staff who did a great job over the weekend of 17th November. 

We have two main server rooms from which we deliver our main IT services. Servers in each room are running all the time so that we can use the full capacity for all of our services. If one room fails, however, we can continue to run critical services from the other. 

Now, servers pump out lots of heat and one of the most common causes of failure is around cooling. That’s exactly what happened to one of our rooms in November – one of the cooling units failed, and then the second cooling unit, now under a heavier load, started to wobble. The temperature climbed rapidly and it wasn’t long before some of the servers started to automatically shut down to protect themselves – a nightmare scenario for IT Staff. 

IT Services and Estates staff managed to get the temperature under control, but it was clear that the air conditioning had to be repaired quickly and that meant having to shut down ALL the cooling for several hours. IT Services staff prepared a plan of action, and worked over the weekend to move essential services into the other computer room ahead of the shutdown. When the time came, the cooling was shut down as were most of the servers in the room. We were able to continue running the most essential University IT services throughout the day from the backup room and once the air conditioning was repaired things were back to normal fairly quickly. 

We were able to do this because for several years we have been building resilience into our overall IT Architectures. We have dual communications links, dual server rooms, and we use technologies that allow us to move services from one room to the other, and keep copies of critical data in both computer rooms. 

It all paid off that weekend, and indeed it has paid off on a number of occasions. More than once, we have had some kind of problem with our network links, or servers or server rooms, but you would never have known because we were able to keep essential services running. 

So, my thanks to all the IT Staff who make this possible!

Leave a comment