Migrating data from a 15-year-old application
Fabian de Almeida Ramos
Read more by Fabian de Almeida Ramos
The Stadsbank provides a service similar to that of a pawn shop: customers can loan one or more items (known as ‘pawns’) to the Stadsbank, in exchange for a monetary sum. Thus, starts a loan process, where the end goal is to have the customer repay this monetary sum to collect his or her pawn. Being an institution of the municipality of Amsterdam, the Stadsbank is a non-commercial organization, meaning there is no profit motive. Over the years, more than 90,000 customers have loaned over 900,000 pawns to the Stadsbank. To keep track of all these customers and pawns, the Stadsbank needs a lot of data to support their core business, such as data about cash drawers, payments and other financial transactions.
All-in-all: a lot of data, which had to be accessible and usable within their new system. Migrating such massive amounts of data introduces some challenges:
- We are not just rebuilding their application, but also their data model as a whole. There are new requirements and regulations in place, making data transformation inevitable.
- As we are working with financial data, data integrity is of the utmost importance.
- The final migration would be a ‘big bang’, meaning that the transition from the old to the new system is instant rather than gradual. On top of that, the current system is actively being used, and we only have one opportunity to migrate everything at once, so that the new system can be up and running right away.
So, there were two questions we needed to answer:
- How will we approach migrating this data?
- How can we guarantee the integrity of this data?
Preparing our approach
Let’s focus on how we approached the migration first. While building an application, one of the things that can be bothersome is a lack of available test data . Luckily for us, we had an entire dataset to migrate, meaning we could get some more representative testing done during development. We didn’t know what our entire data model would look like, so we instead migrated only the data that we needed while building our stories – that way we could be sure that we wouldn’t pollute our database with data that wasn’t being used at all.
Every morning, the Stadsbank creates a data dump, which we imported into our own environment at 8AM. This way, we didn’t need to worry about some type of open connection, while freely and easily browsing this old data ourselves – something that’s very useful when rebuilding the data model.
Furthermore, we concluded that the transformations we applied could be quite intrinsic. Because of this, we created a separate service specifically for migrating data from the copied database into the new database that would be used by the new application. This resulted in an architecture like this:
To identify possible problems within our data mapping as soon as possible, we decided to run a nightly migration, starting at 12AM. At first, we only migrated the small datasets (like client names), but later on, we started migrating more and more data that fit with the new functionalities. That’s why migration took longer as time went on, instead of shorter. Eventually, we started running into the problem that our migrations weren’t done fast enough: as we imported the dataset again at 8AM, we only had eight hours to finish up migration, otherwise the migration service would crash. This means we had to start spending some time making our migrations more performant (and less heavy). To achieve this, we did a couple of things:
Used native queries
Our migration service was a Spring Boot service, initially making use of JPA and Hibernate. Both these libraries make developing a lot easier, but in the grander scheme of things could have impacted performance, as more things happen in the background. Therefore, we decided to remove them, going for native queries as much as possible.
Identified static data
Some of the data that we needed for migration was very static: data, like the different types of characteristics that articles could have, never changed but was quite a chunk nonetheless. By importing this data only once, and not migrating it every single night, we shaved off some time of our migration.
This would prove to be quite tricky, as most of the data is very codependent. However, as time went on, we did identify more and more data that could be migrated concurrently and safely, which also helped to save some time.
By taking the above mentioned steps, we managed to scale down our migration time to about six hours!
Preparing for another migration
So, in the end, we managed to finish everything up quite nicely. There’s a couple of key takeaways that we picked up from this whole process that we personally would always consider in future migrations, and would like to share with you:
- Be selective with what you migrate. While setting up data migration it’s paramount to know which data you do and don’t want to migrate. If you’re running your migration multiple times, we would advise you to identify data that is static enough as to warrant an import only.
- Migrate often, if time permits it. Something that was extremely rewarding for us, was migrating as early as our first sprint. By doing so, we knew how to anticipate the old data, always had test data available and most importantly, had certainty that our migration would go as smoothly as possible.That being said, migrating the data is only half the work. In this blog, I only discussed how to approach migrating data but haven’t really touched upon how we guaranteed that this migration ran correctly. But that’s a tale for another day (or rather, another blog post).
If you’re interested in our work for the Stadsbank van Lening, please take a look at our case study.
The best stories about innovation?
Sign up for our newsletter!
Please leave your name and email below