Change data capture(CDC) with AWS RDS (MySQL), AWS DMS Service

Preetham Umarani
3 min readAug 8, 2023

We work with top e-commerce platforms out there and we were supposed to comply to their data management policies. Out of all top need was

  • Encryption at rest and in-transit of the Personally Identifiable information(PII) data.

Current architecture:

  • AWS RDS
  • API Layers on EC2 and Kubernetes
  • Redis
  • RabbitMQ

RDS was KMS enabled for hardware encryption and TLS for in-transit, however compliance release mentioned to manage the PII data.

We were having around 18 million rows only in customers table, and the PII data was not encrypted. And the data kept flowing in every second. Task at hand was to change the API layer. (Link for the spring boot app with java implementation.)

  • Insert encrypted data
  • Read encrypted data and send it in human readable form to the User Interface

Approaches we were thinking initially:

  • We shall create copy of the database and encrypt, then move the API layer to the production. This takes some hours to the job.
  • Downside of this approach is while we move backend API layer to production, there would be many writes to un-encrypted database, which we would miss in new database.

We were experimenting with Kafka connect for CDC implementation. However, this needs some infrastructure and experience to get this right. Did a PoC and got it working for small dataset.

We found out there is a service of AWS called Data Management Service, which does the job quite well. It’s an off the shelf service for managing data migration tasks.

Below is the architecture for creating DMS infrastructure. (make sure RDS has binlog enabled)

Flow was something like this:

  • Create a Replication instance
  • Create Endpoints (Source and destination endpoints)
  • Create database migration tasks

In this case source was RDS MySQL and destination we chose was Kinesis. Consumers for kinesis were on EC2 to process incoming data and encrypt it and store it in new database.

When migration tasks are started, it processes whole database copy and then pushes it to the destination, in this case it’s kinesis. Every time the task is restarted, it starts from the beginning(processing of the whole table).

Consumer for Kinesis was built on JPA, we used plain JDBC to build this, since JPA had some issues while updating the new table with IDs which were created earlier. (details for this will be shared in another blog).

Once we moved API layer to production, we had little downtime and the entire data-set was processed as like the old database.

AWS RDS with DMS turned out to work quite well here. Please share the approaches tried out to implement CDC and tools.

Don’t forget to Leave a clap! Until next article BBYE!

--

--