How to move 160 servers without moving 160 servers

What are some of the challenges faced by IT Services staff? Here is a guest contribution from “Bobby G” – one of our senior IT technical staff:

“It may not be the type of question we ask ourselves everyday but we have recently been in a position where we have been required to move 160 of the university’s key servers onto new computer hardware often in diverse locations, and we wondered how to carry this work out as quickly as possible and with as little impact to our customers as possible.

The servers are all real working servers providing many important roles for the University including Library, Teaching, Financial, Research, and Support services. The amount of data involved is also quite large with around 5TB of data being involved (For those of us familiar with 1.4MB Floppy Disks that’s around 3.5 Million Floppy disks worth of information).

There is a trick as I suppose there usually is with these types of questions, and the answer is to move most of the servers in a “virtual” manner. This still involves moving where the server really is in terms of all of its intelligence (CPU/Memory), but actually leaves the data with all of the information and disks exactly appearing to be where they always were. There are now a number of systems which allow us to carry out this type of work and the University has used a tool from VMware in this instance. This has allowed us to reduce the total number of real physical servers used by half from 20 to 10 servers while almost doubling the amount of computing power available.

This makes the system much greener as there are significant savings in electricity and room cooling costs, and makes it much easier to add additional servers at a very low cost.

With a little careful planning we were able to move all 160 servers in around 9 hours one weekend with most services being unaffected by the move and most of those that were affected only being shut down for around 10 minutes.

The new setup of the system has been automated in such a manner that as servers get busy or if a physical server fails the “virtual” server will now move around to find a comparatively quiet working server and no one will even know it has moved (unless they have access to the log files). So we currently know the room that your server runs in but not exactly where it is as it may have moved itself in the last 10 mins. One of the next challenges we are giving ourselves is to setup the system so that we don’t even know which room the server is running in to allow the systems to move between buildings for themselves when a service is busy or there is some form of problem (e.g. power outage) in a building.

So in answer to the question “how do you move 160 servers without moving 160 servers” – you only move the little bit of intelligence that runs the servers and leave the rest set up as it is. (i.e. move where the server thinks it is).”

Wireless Coverage at RGU

We’ve had a few issues with our wireless network over the past few weeks – apologies for that to our staff and students. Without going into the details, there were a number of unexpected issues with the central controllers and we’ve been in regular contact with the supplier and manufacturer to get these sorted. We still have some background work to do in order to get to the root cause of the issues, but we have an interim solution in place now and the service itself is much more stable for our users. We are, as it happens, shortly going to install a new Wireless Network Infrastructure anyway to meet the requirements of the new Riverside East building down at Garthdee, and to meet the growth in demand for coverage across the whole of the Garthdee Campus.

Use of wireless networks, as we all know, has exploded over the last few years to the point where we increasingly take it for granted that we will have wireless access in many public spaces and places of work. In designing our “Riverside East” new building down at the Garthdee Campus, we were clear from the outset that we wished to see wireless coverage available within the entire building. Our current wireless infrastructure varies across the Campus. When it was first implemented, the intention was to make sure that the key “public” areas were wireless enabled, including the core committee rooms, library and teaching areas. But, it was not designed to be a complete solution – especially in our older buildings it did not at the time make sense to attempt to put wireless coverage into every single room.

Requirements change, however, and with the range of mobile devices in use today complete wireless coverage is now a growing expectation, and we are receiving requests for the wireless service to be extended to areas that currently have poor coverage. As mentioned above, we are starting the process of refreshing our wireless network with the new Riverside East building in the first instance. We need to get that building commissioned as our first priority, but whatever solution we procure for that will be sized so that it can be expanded across to the rest of the Campus.

Installing a wireless network in a large building with thousands of users is a complex task. Wireless signals are broadcast by what we call “wireless access points” – you’ve probably seen them on the wall around the campus. Each access point can only handle a limited number of connections without losing performance, and it must not be too close to another access point using the same radio frequency, and there is a limited number of radio frequencies that by law we are allowed to use. So, we have to position the access points across the building (vertically as well as horizontally) to match what we think the demand will be in each area and configure them so that they don’t interfere with each other. To achieve that, we need to calculate how far each signal will broadcast and that depends on the construction materials used in the building. Early on in the design of the building, we put the CAD drawings through a software programme which calculates how the wireless signal will behave theoretically and we use that to estimate where the access points should go. That is only an approximate guide, however, and once the building is complete engineers have to physically walk through the building to measure the actual signal loss in each area before they can finalise where all the access points should go.

Even having done all of that, there are still some limitations on wireless technology. If you are doing heavy downloads of large files, and particularly if there are a number of people doing that in the same physical location, you may find that the wireless network will slow down – that’s just the laws of physics and how much traffic can be carried across one radio signal in one location. That said, we are looking at newer wireless technologies that will improve performance even in areas of dense use. So, while there might still be some occasions where it is better to “plug in”, for most everyday tasks – email, web browsing, Facebook usage etc the new wireless network will be fine.

Green ICT

Hopefully you will have read the recent edition of RGU’s “Green Times” – if not you can read it here.

It includes an article which shows the environmental impact of PC’s being left switched on and what you can do to help – by turning your PC off when it is not in use. What is less obvious to our staff and students is the impact of “behind the scenes” ICT. Way back in 2007, the Gartner Group estimated that globally information and communications technology contributes to some 2% of total carbon emissions.

That’s about the same as the aviation industry. A quarter of the ICT related emissions come from data centres running servers, which then require further energy to keep them cool. . .

We have many servers running in our server rooms on Campus, and the rooms themselves are nothing like as efficient as modern datacentre standards. We expend much more energy on cooling than we need to. Aberdeen University is in a similar situation, so are Aberdeen College and Banff and Buchan College (although they have recently upgraded their server room), and at the moment we all run our data centres independently from each other. Over the past 3 years we have been working hard to see how we could collaborate to reduce all our costs and carbon emissions.

This culminated in an agreement earlier this year to move into a shared datacentre by upgrading space in the University of Aberdeen. It’s currently under construction, and will be ready by the Spring of 2013. Initially, it will become the primary datacentre for Aberdeen University, Robert Gordon University and Aberdeen College, with Banff and Buchan using the facility at a later date.

The environmental impact of this will be substantial. We estimate that the total power consumption of all the servers from the 3 institutions is 220kWh. In our own separate, old, data centres at the moment we probably use the same amount of power again just to keep the servers cool (220 kWh is roughly 22 electric cookers, with everything switched on, running all the time – just picture it). CO2 emissions from all that will come to 2030 metric tonnes per annum.

By packing all these servers into one modern datacentre we will slash the energy required for cooling. We estimate that our total power consumption will drop from 441kWh to about 264kWh. As an added bonus, much of the electricity will be generated by Aberdeen University’s combined heat and power plant with lower associated carbon emissions. In total, we anticipate saving 1061 tonnes per annum.

And into the bargain, the institutions will save £2.6m collectively over the next 10 years.

IT Strategy – our Infrastructure

Behind the scenes, we run a sizeable infrastructure to deliver the services available to our 12,000 registered users (staff and students, including distance learners). There is a Campus wide network (wired and wireless) connecting around 3,000 workstations, hundreds of phones, printers and an untold number of personal devices. This links to the core University services and also, of course, to the Internet via the national JANET academic network that connects all Universities and Colleges. We have two main server rooms on Campus that house all our servers and storage – we manage over 300 servers and over 40 Terabytes of storage. All this runs 24 hours a day, 7 days a week. This whole environment has to be constantly monitored, backed up and upgraded – old servers replaced, software updated, capacity expanded and new services brought on stream. It’s a bit like changing the engines on a Jumbo Jet one at a time without landing . . .

Most of this is unseen by our users, but it has to be a key part of our strategy to keep our infrastructure up to date, and to plan in advance what capacity we will need over the next few years. Particularly when many of our users use external services (e.g. Apple Cloud, Google, Facebook, YouTube) and these can change quickly – increasing the traffic on our network.

We’re also looking at ways to reduce costs and lower our environmental footprint. We have recently signed up with other regional institutions to create a shared regional datacentre, and we will move half of our servers and storage into this in a year’s time. The new datacentre will be state of the art and highly efficient in its use of power. Our equipment consumes over 100kW of electrical power and it takes more energy on top of this to cool it which is why it is so important to have an efficient datacentre.

We also have to plan for unforeseen events – what happens if we lose a server room through fire, power loss or other incident? What happens if one of our major network links is cut? We run dual systems on our critical services to ensure that they can continue even if we lose one of the server rooms and we have dual links and equipment on the core of our network. It is important to test our disaster recovery measures, and this year we are going to start running regular rehearsals which will include shutting down one of our server rooms to prove that the critical services continue to run as expected.