Fuse has recently completed the migration of a customer's 2007 SharePoint farm from a "traditional" hosting environment to the Azure IAAS cloud. The farm runs four major websites, including the main website of our local county council (Northamptonshire) so it was critical that downtime was kept to a minimum. At the same time as migrating the farm, we were also introducing a new SharePoint 2013 farm to support a newly developed search-driven site, and to ultimately upgrade and host the existing 2007 sites. In this post, I'll explain some of the challenges and successes we experienced.
Northamptonshire County Council is a customer we've worked with for many years, particularly around SharePoint. The 2007 farm hosting
www.northamptonshire.gov.uk was originally built by Fuse as combination of physical and virtual (VMWare) servers as a three tier farm, with a separate UAT environment. Active directory services were provided by local AD servers replicating over a VPN link back to the customer's site. Shared load balancers and firewalls were managed by the datacentre providers, part of a managed service that included OS monitoring and patching. Over three years, the infrastructure has worked pretty much faultlessly, with the only few incidents of unexpected downtime caused by factors outside of anyone's control.
The sites themselves are developed by a combination of Fuse and NCC, with all support provided by Fuse. Over the years, the SharePoint sites have grown to include many custom solutions, forms based authentication, custom data-driven applications, and data pulled from web services hosted in Northampton and elsewhere. A custom mobile solution and multiple SSL sites add to the overall complexity of the hosting environment. Even though we carry out every deployment and have overall responsibility for support, the prospect of moving the farm to a new infrastructure was a daunting one.
So Why Move?
Three main factors drove the move to a new infrastructure:
- SharePoint 2007 is an aging platform, and any new sites would need to be developed on 2013, requiring a new farm, whilst needing to maintain the existing one as the sites are upgraded – the cost of doing hosting both farms simultaneously with the current provider, who requires annual commitment, was prohibitive.
- NCC is now part of
LGSS (Another Site hosted by Fuse on Azure in this farm!) who will be building their own hosting environments in the medium/long term. Using Azure, we could build an environment that could easily be moved back into an on-premises environment, at any time, or simply taken over by LGSS.
- We had worked with other customers to build large-scale SharePoint farms on Azure, and found the reliability, features and flexibility to surpass that of any hosting environment we'd ever come across. Combined with the seemingly ever decreasing costs and increasing features, Azure ticks many boxes.
Having convinced the customer that Azure was the right way to go, we engaged with Microsoft's UK Azure team to set up the subscription as part of NCC's EA agreement. This gave them improved pricing and access to the Azure super-portal that allows management of multiple subscriptions – a useful feature that then enabled NCC to delegate access to an entire subscription for us (something we and Microsoft guided NCC through setting up) and will allow future subscriptions in the future to be used for other projects with separate budgets. Azure's pay as you go model doesn't seem at first glance to fit with local government procurement, but done the right way, it's very simple to set up, and is just an extension of an existing agreement. This also covers off any licensing issues, which can be complex with a mixed on-premise/cloud infrastructure.
We've built several SharePoint 2013 farms in Azure, and they've all been relatively straightforward. As SharePoint 2013 and SQL 2012 are built for Windows 2012, which sits very comfortably on Azure, there are no real issues to overcome - essentially you build the farm as you would on premise, with certain tweaks for optimising performance on Azure - particularly with the SQL servers. A SharePoint 2007 farm is a different matter:
- Only supports Windows 2008 R2 – which is still OK on Azure
- No support for SQL 2012 or Availability groups
- Requires sticky-session support in a load-balanced environment
- No SNI (server name identification) in IIS 7
We also had our own challenges to overcome with this particular environment:
- VPN needed to replicate the Active Directory, for user authentication of content editors from NCC, using their corporate credentials
- Transferring the existing content and application data
- Ensuring all the various features of each site were operational prior to go live
- Co-ordinating all this at the same time as launching a brand-new site
Here's some detail on how we overcame each challenge to build a successful solution.
Two Farms in One
In order to minimise the costs of hosting two farms, we knew we needed to share as much as possible between the farms, while still ensuring high availability. The sites in the farm regularly exceed 15,000 visitors per day, so it infrastructure must be able to comfortably handle those traffic levels. During certain periods (bad weather for example) traffic can quadruple. So we designed the following architecture for production:
- Two Windows 2012 Active Directory Servers, running DNS and replicating with the AD servers at NCC
- Two Windows 2012 servers, in a Windows cluster, running two instances of SQL 2012 – an Availability group for SharePoint 2013, and a mirror/witness set for 2007
- Azure configured to run a front end service for 2007, and a separate service for 2013 - effectively two distinct load balancers
- All servers connected to the same virtual network, divided into subnets, and connected to NCC's infrastructure through a VPN tunnel
- Two SharePoint 2007 front end servers running on Windows 2008 R2
- One SharePoint 2007 server running on Windows 2008 R2 as a search crawler/indexer etc.
- Four SharePoint 2013 servers running on Windows 2012, with two acting as the application/batch processing tier
- UAT environment consists of two SharePoint 2013 and one SharePoint 2007 server, connected to a single SQL server running two instances to SQL. The servers are joined to the same AD as production.
All the SharePoint 2007 servers have more memory, disk space and processing power than the existing farm, but are still cheaper to host at Azure.
SQL Issues with 2007
Although officially SharePoint 2007 doesn't support SQL 2012, adding another pair of large SQL servers when we already had a decent set seemed like something that was worth avoiding, particularly as the intention was to wind down 2007 over the life of the solution. Before we put the solution together we looked into what it would take to make SharePoint 2007 work on SQL 2012, and it turns out, not a lot – there's some good blog posts out there explaining the necessary steps, and we've found no issues so far. We have kept SharePoint 2007 on its own instance, and limited memory accordingly for each instance, so we don't make any SQL changes that could affect our 2013 farm supportability.
This has also meant we can take different high availability approaches for each farm. 2013 natively supports all the SQL 2012 HA features, so we've gone for the one that best suits Azure, availability groups on a Windows cluster. 2007 doesn't (though we did try hard to convince it!) so instead we've opted for a witness/mirror configuration.
previous blog post I've mentioned how good Azure can be for load balancing SharePoint – but this was for SharePoint 2013, which handles sessions across the farm much better than 2007. Although anonymous users are fine without sticky sessions, there are a number of features within the sites that require a sticky session, which Azure load balancing doesn't do. The first issue this caused was with site editing for Windows-authenticated users, which on post back would generate an invalid view state if it went to the other server – this is easily fixed by having the same machine key in the application's web config, which fixes a similar issue for forms-based authentication too.
All the other transaction-based custom applications within the farm seem to handle non-sticky sessions well, but this could just be luck – so we looked at some different solutions in case our luck doesn't hold out. The first was a virtual load-balancing device from Kemp, which is available from the Azure store. This is essentially a full featured load balancer that runs behind the Azure load balancer and in front of the SharePoint servers, and can handle sticky sessions. Secondly we looked at the
Application Request Routing module on IIS 8, which turns any Windows server into a full featured load balancer/cache server. As we also needed an SSL solution, this was what went with, though currently it's only handling our SSL issue.
SSL and SNI
Because our 2007 farms hosts multiple SharePoint applications, and some of those have SSL, we needed a way of having multiple SSL certificates on the same port. In our old hosting environment, our hosting providers gave us another IP address and we simply forwarded that on a different internal port to SharePoint. With Azure, things aren't quite so simple – another IP address requires another "service" and virtual machines can only exist in one service at a time. SNI is an option, but only for 2013 farm, as it's a feature of IIS 8. For our 2007 farm we needed to a similar solution to our existing hosting, and came up with using
ARR to forward the requests from a new Azure service to our SharePoint 2007 front ends, again on an internal port. As the ARR servers are also Windows 2012, we can use SNI on these to identify the requested site and forward it correctly to the SharePoint 2007 servers, enabling us to host multiple SSL sites without requiring any more servers or services.
VPN and Active Directory Replication
Our existing hosting environment already used a VPN to enable AD traffic between sites. From this and other Azure implementations, we knew that the Azure site-to-site VPN and virtual networks would provide a workable solution, but we also know that any VPN into a large/complex environment is never straightforward. We also had to ensure the replication topology could handle the two branch sites. This took a lot of work with the customers networking and AD teams, and had we not had previous experience with Azure VPNs, we might not have persisted. I'm glad to say it does all work, and enables other solutions, such as on-premises back up, monitoring etc., to be used seamlessly with Azure – effectively the Azure servers in Dublin are now as much a part of the NCC infrastructure as those in Northampton.
In order to perform the migration with minimal downtime, we had set the farms up and configured them with all the required settings and solutions over a number of weeks, testing against a copy of the content and application databases we had copied across as SQL backup files. The final step to making them live was to freeze content and migrate the data one more time, before switching DNS entries to make the new farm available to the public. However, we'd found with the initial data copy, where we'd used the two VPN tunnels to copy the live SQL data up to NCC and then back out to Azure, had taken much too long.
We found a much better way was to use the Azure/SQL backup tool, which can automatically transfer any SQL backup, as it's created, to Azure storage – by connecting this to storage available to our new farm servers, the backups were ready in minutes instead of hours and meant we could commence the content restore that night, shortening the entire migration process by around a day. We simply scheduled the SQL backup to occur after the content freeze and waited for it to arrive at Azure.
Planning, Testing and Project Management
Of course we can't pretend the whole process went completely smoothly, and there was still some minor issues once the sites went live. However these were quickly identified and fixed. This was due to the way the project was managed between NCC and Fuse, with frequent update meetings, shared project resources and effective communication. A well-developed test plan ensured that once the migration was completed and the new farm made live, all the features could be tested and the integration endpoints updated – this would not have been possible without both teams working together.
This was a major set of changes for NCC, with significant risks and a large number of people involved on both sides. Fuse overcame a number of technical challenges to deliver a solid platform that has saved a significant amount of public money and enabled NCCs digital team to continue delivering award-winning services to the public.