Guaranteeing the integrity of a data migration

Guaranteeing the integrity of a data migration

In my previous post, I gave insights into how we approached the data migration of a fifteen-year-old application: the system used by the Stadsbank van Lening. As we are working with financial data, migrating the data was just half the battle, we needed to make sure that this data is migrated properly and securely, guaranteeing its integrity. However, as their old system consisted of a lot of personal data, we only had access to an anonymized version of the real data set during the development process. This posed another challenge for guaranteeing and maintaining integrity.

Blog

Migrating data from a 15-year-old application

 

What should we verify?

We embarked on our integrity journey by first deciding which elements were critical to verify. Sure, we were migrating a lot of data, but not all of it needed to be migrated correctly. Data that is used for determining what type of identification a customer used isn’t as important, for example, because it can be reverified very easily whenever a customer comes by. But financial transactions are a lot more difficult to fix. This data is being passed along to bank accounts and such. Thus, we verified our data by asking ourselves the following questions:

  1. Is the correct amount of data migrated?
    This can be quite an easy metric to verify. We check the amount of data in the old system and check how many records we have in our new system. By checking if these amounts are identical, we can confirm that the correct amount of data is migrated.
  2. Is the data migrated correctly?
    This can be a bit more difficult. Some form of data transformation is almost always required; whether in the form of different formatting, or in data being migrated across multiple tables (through normalization, for example). This makes it more expensive to set up automated checks to see if all the data was migrated correctly.

Realizing the first way of data verification was very easy and could be applied to all the data we migrated. The second way, however, proved to be a bit more difficult. We tackled this problem using two methods:

  • For the financial data, we summed all the amounts and checked if the total sums were identical.
  • For basic data, such as client name, client number and pawn number, we decided to create a HASH. These three fields (client name, client number, pawn number) were hashed into a single record. By doing so, we could check for discrepancies between records a lot easier, all we had to do was look at the hash, rather than comparing all those fields.

 

Searching for the right meta

Deciding what data we wanted to verify was one thing but implementing these verifications was another. Fortunately for us, however, we had just the tool for this: Metabase. Stadsbank uses Metabase to create management reports, which is an important prerequisite for being able to do their jobs properly. In their own words, Metabase is “an open source business intelligence tool. It lets you ask questions about your data, and displays answers in formats that make sense, whether that’s a bar graph or a detailed table.”

One of Metabase’s best functionalities is the ability to link multiple databases, create questions for them, and group them together in a single dashboard. Even though both database management systems (DBMSs) were completely different, we could still showcase the data uniformly.

Data-integriteit

As Metabase allows these questions to be as specific as you want to, we could set up questions that would display the exact data we wanted and give us a clear overview of the differences between the source data set and the migrated data set through the use of SQL.

This helped us to determine what data was correct and what data wasn’t, it refined our migration, and made everything run smoothly. And run smoothly it did! And so, migrating all this data only caused a couple of very minor hiccups, nothing worth going into overtime over.

In conclusion, there are two tips I would like to give you regarding verifying whether or not your data is migrated correctly:

  1. Determine what data you actually want to verify. Migrating a huge set of data does not necessarily mean that all of it needs to be migrated correctly. It really boils down to risk management and seeing what type of problems arise if some of the data isn’t migrated correctly. Keep in mind, however, that with sensitive data like financial records and payments, you always need to verify if it’s migrated correctly.
  2. Use a business intelligence tool. Even if you’re using two of the same DBMSs, it could still prove very useful to import both databases in some sort of business intelligence tool, so you have a good oversight as to how your data was migrated. Metabase is a good example: they have a cloud version but you can also host it yourself on your own servers. That’s what we did and it proved to be a great asset.

If you’re interested in what we actually did for the Stadsbank van Lening, check out our case study.

 

 

Case

A new digital pawning system for the Stadsbank van Lening

The best stories about innovation?
Sign up for our newsletter!

Please leave your name and email below

                 

Blijf op de hoogte

Meld je aan voor onze nieuwsbrief en ontvang elke maand artikelen en blogs over innovatie en onze podcast.

Stay up to date

Sign up to receive valuable insights to accelerate your business through innovation.