Things To Consider When Doing a Website Migration
SSIS or not to SSIS?
This was essentially our first question and it prompted us to spend half a week on prototyping a quick migration of one table from the java stack into an existing table on the Microsoft stack. The prototype worked great but left us with little confidence that this was the tool for us, especially when dealing with a database designed by a cryptic programmer hell bent on bad structure!
The team at the time was lacking skills in SSIS which left is with three options:
- Hire a contractor with some skills in SSIS
- Learn it ourselves
- Use the KISS (Keep It Simple Stupid) rule and make a custom console app
After doing some risk analysis of going down the SSIS route and getting stuck with a complex data structures we decided to play it safe and stick with what we know (at this point we had been given a date to achieve the migration that seemed almost laughable).
After making this decision we decided what tech we needed for the console app, this consisted of C# for the console app and Dapper (fastest ORM in the land). We did consider using bog standard ADO.net to handle the data as in terms of performance as it’s simply the fastest way of accessing data. However this was overruled as the time for development would suffer, especially considering there was something like 40 tables to map.
Understanding the database
While deciding the architecture we also started analysing the database tables and how they fit into the new database. We ended up in the following scenarios:
- Table exists and can be migrated directly - Yeay!
- Table exists but there is bad data - will require some cleaning from the console apps side to remove dodgy characters, inconsistent data etc. Analysing how bad the problem is beforehand is crucial, if it takes five minutes to clean is SQL and two hours to write in code then go with the SQL solution.
- Table exists but requires reformatting to fit into the new structure - will require some processing in the console app to assign the correct ID's, normalize the data etc.
- Table does not exist as it’s a new feature - one of the easiest problems as you’re not retrofitting the data into an existing table.
- Table column datatype differs - will require some conversion in the console app.
- Redundant data - We had a LOT of this!
To map the relationships we used Excel (and a lot of coloured cells) to provide a translation from one column to the other. Sometimes a single table in the old database would map to two or three tables in the new system, these relationships where also clearly defined. In the case of datatype conversions we listed each conversion type on a separate sheet and referenced each with the column name. This gave us a clear picture of what conversions were used multiple times which made it easier to code.
Having mapped the database and coded the console app to do the conversion we then added logging to ensure we had a record of who was getting migrated, at what time, the result of the migration, and even some of the important parts of the data relating to that user. The database containing the log was located on another server to ensure it didn’t hinder the live/test sites performance during the migration.
This can sound like a step you won’t need, but I can tell you it helped immensely when users are having issues (which you’re bound to get with a legacy database).
After having a working console app our next task was to calculate if the time to finish the migration was within an acceptable timeframe. To do this we firstly categorized our users by ordering based on new users with little data to the meaty power users. Along with the category we also counted the number of people within each group and then multiplied it by the time taken to migrate one person from each category.
As a firm believer of never pre-optimizing, our migration app turned out to be too slow. Our first approach speed up the process was to modify the application to run in parallel (as well as some obvious minor changes here and there). By running the applications asynchronously we now had the power of scaling the application by creating multiple instances on different servers till our estimation time was within an acceptable timeframe.
Great we just removed a bottleneck! But wait... the downside of doing this was the strain it would put of the database as we scaled up. So what did this mean? You guessed it, further benchmarking on the database to see what our limit was on concurrent migration apps.
As our customer base was about to increase by a huge margin load balancing was curtail in identifying if the current architecture would be able to cope with the extra load. To do this we used a tool called JMeter (http://jmeter.apache.org/) which is a free opensource java based loadbalacing software. What JMeter effectively does is simulate load on your defined areas to build up benchmarks. The key point to realize here was that accurately simulating load was almost impossible, so we would user this as a rough guide. After running the load tests only one area was deemed to be problematic which was rectified after some performance tweaks.
As a safety measure we also spun up some extra servers and added them to the live bank. After the migration if we felt we didn’t need them we could always tear them down.
Nginx and DNS
Redirecting correctly from the old site to the new was another major issue, especially considering the migration from old site to the new would take something in the region of four to five months and during that period users should not be accessing the wrong site.
To tackle this problem we firstly created a good old flow diagram to work out at what point users should be redirected while in different states (logged in/out, migrated etc..) and implemented the changes.
To handle some of the other redirect problems, we used Nginx (http://wiki.nginx.org/Main), which sits on top of the two applications. Nginx allowed us to do cool things like handle the SSL certificates between the two sites and redirect URL's with logic behind the redirects (using Lucene).
Nginx also came to the rescue when switching the DNS location which took something in the region of 4-5 days to propagate. On the day of the switch it allowed us to quickly set in place the redirection.
Impact on the users
During this migration I found that judging and minimizing the impact on the users was the most difficult part (perhaps because I'm a developer). It was an even more aggravating issue for us as our customer base was mainly composed of elderly users who were far less accepting of change in comparison to younger users.
In order to reduce the customer backlash we decided to take the following steps:
- Ensure the new site has feature parity
- initiate the migration in phases starting with the lowest impacting users on the business.
- Inform users of the pending changes.
- Gather feedback after each phase.
- Utilize social networking to aid and support users
- Provide informative messaging on the new site (as most people don’t read all their emails).
Handling a migration is a great project for a developer, there are certainly a number of tricky scenarios/puzzles to handle. If done incorrectly it can be an absolute nightmare and could damage the business irreversibly, however if the correct procedures are taken there shouldn’t be any cause for concern (and minimal hair loss).