Guaranteeing the integrity of a data migration
Fabian de Almeida Ramos
Read more by Fabian de Almeida Ramos
What should we verify?
We embarked on our integrity journey by first deciding which elements were critical to verify. Sure, we were migrating a lot of data, but not all of it needed to be migrated correctly. Data that is used for determining what type of identification a customer used isn’t as important, for example, because it can be reverified very easily whenever a customer comes by. But financial transactions are a lot more difficult to fix. This data is being passed along to bank accounts and such. Thus, we verified our data by asking ourselves the following questions:
- Is the correct amount of data migrated?
This can be quite an easy metric to verify. We check the amount of data in the old system and check how many records we have in our new system. By checking if these amounts are identical, we can confirm that the correct amount of data is migrated.
- Is the data migrated correctly?
This can be a bit more difficult. Some form of data transformation is almost always required; whether in the form of different formatting, or in data being migrated across multiple tables (through normalization, for example). This makes it more expensive to set up automated checks to see if all the data was migrated correctly.
Realizing the first way of data verification was very easy and could be applied to all the data we migrated. The second way, however, proved to be a bit more difficult. We tackled this problem using two methods:
- For the financial data, we summed all the amounts and checked if the total sums were identical.
- For basic data, such as client name, client number and pawn number, we decided to create a HASH. These three fields (client name, client number, pawn number) were hashed into a single record. By doing so, we could check for discrepancies between records a lot easier, all we had to do was look at the hash, rather than comparing all those fields.
Searching for the right meta
Deciding what data we wanted to verify was one thing but implementing these verifications was another. Fortunately for us, however, we had just the tool for this: Metabase. Stadsbank uses Metabase to create management reports, which is an important prerequisite for being able to do their jobs properly. In their own words, Metabase is “an open source business intelligence tool. It lets you ask questions about your data, and displays answers in formats that make sense, whether that’s a bar graph or a detailed table.”
One of Metabase’s best functionalities is the ability to link multiple databases, create questions for them, and group them together in a single dashboard. Even though both database management systems (DBMSs) were completely different, we could still showcase the data uniformly.
As Metabase allows these questions to be as specific as you want to, we could set up questions that would display the exact data we wanted and give us a clear overview of the differences between the source data set and the migrated data set through the use of SQL.
This helped us to determine what data was correct and what data wasn’t, it refined our migration, and made everything run smoothly. And run smoothly it did! And so, migrating all this data only caused a couple of very minor hiccups, nothing worth going into overtime over.
In conclusion, there are two tips I would like to give you regarding verifying whether or not your data is migrated correctly:
- Determine what data you actually want to verify. Migrating a huge set of data does not necessarily mean that all of it needs to be migrated correctly. It really boils down to risk management and seeing what type of problems arise if some of the data isn’t migrated correctly. Keep in mind, however, that with sensitive data like financial records and payments, you always need to verify if it’s migrated correctly.
- Use a business intelligence tool. Even if you’re using two of the same DBMSs, it could still prove very useful to import both databases in some sort of business intelligence tool, so you have a good oversight as to how your data was migrated. Metabase is a good example: they have a cloud version but you can also host it yourself on your own servers. That’s what we did and it proved to be a great asset.
If you’re interested in what we actually did for the Stadsbank van Lening, check out our case study.
The best stories about innovation?
Sign up for our newsletter!
Please leave your name and email below