Zero-Downtime MySQL Schema Migration: A Fintech Case Study
Cybersecurity

Zero-Downtime MySQL Schema Migration: A Fintech Case Study

Modifying schemas on very large MySQL tables can disrupt live systems when traditional DDL operations are used.

Mafiree
Mafiree
5 min read

Modifying schemas on very large MySQL tables can disrupt live systems when traditional DDL operations are used, because such changes typically lock tables and block read/write activity for the duration of the operation. The case study from Mafiree explains how the team supported a fintech client in evolving a MySQL schema  through a carefully planned MySQL schema migration on a table containing more than 500 million rows without causing downtime, ensuring continuous application availability throughout the process.

The article outlines the key technical challenges, a structured three-phase migration strategy, the tools selected for each phase, configuration and performance considerations, and practical lessons that enabled a safe and reliable rollout.

Why Zero Downtime Was Critical

Direct schema changes on large MySQL tables—such as adding columns, altering indexes, or modifying storage attributes—often require rebuilding the entire table. In high-traffic production environments, a rebuild places locks on the table, potentially blocking operations for minutes or even hours depending on size and workload. In fintech systems, where uptime and transaction throughput are essential, even short interruptions are unacceptable. Standard ALTER TABLE commands without special handling can therefore severely affect user experience and disrupt ongoing transactions, especially when dealing with hundreds of millions of records.

The Three-Phase Migration Strategy

To prevent production disruption, the migration was divided into three carefully planned phases.

1. Native MySQL 8.0 Online DDL

The first step was to determine whether the required changes could be handled using MySQL 8.0’s INSTANT or INPLACE online DDL features. These operations modify metadata without rebuilding the table, allowing changes to complete quickly with minimal locking. Whenever possible, these native capabilities were used because they introduce the least overhead. However, not all schema changes—particularly those that alter row structure—are supported this way, requiring alternative methods.

2. gh-ost for Complex Changes

For schema modifications that could not be handled natively, the team used gh-ost. This tool creates a shadow table and gradually copies data from the original table while monitoring MySQL’s binary log to capture ongoing changes. These updates are applied to the shadow copy in real time. Once synchronization is complete, gh-ost performs a controlled table swap with only a brief lock, typically unnoticed by users. This approach is particularly effective for extremely large tables that must remain writable during migration.

3. pt-online-schema-change for Compatibility

The team also leveraged pt-online-schema-change from the Percona Toolkit. Like gh-ost, it builds a shadow table and copies data in the background, but it uses triggers to mirror insert, update, and delete operations. It supports a wider range of MySQL versions and works well with foreign keys, making it suitable for environments where gh-ost may not be ideal.

By combining native DDL, gh-ost, and pt-online-schema-change, the team selected the most appropriate tool for each change, balancing safety, performance, and compatibility.

Performance and Monitoring

During the actual migration, transforming the 500-million-row table with conservative gh-ost throttling (nice-ratio = 0.5) took approximately 4.5 hours. Without throttling on dedicated hardware, the process could have been completed in 1.5–2 hours, though aggressive settings are not typically safe in production.

Close monitoring was essential. Key metrics included replication lag, InnoDB buffer pool hit ratio, disk I/O, p95/p99 query latency, and tool progress indicators. If system stress exceeded acceptable thresholds, built-in pause and throttling mechanisms in gh-ost and pt-online-schema-change were used to protect workloads and maintain stability.

Best Practices and Conclusion

The case study highlights several best practices: choose the appropriate tool for each change, benchmark and throttle carefully, continuously monitor system health, and always prepare rollback plans. Ultimately, Mafiree’s experience shows that even massive MySQL schema changes can be executed without downtime when supported by proper planning, the right mix of tools, and vigilant performance management.

Discussion (0 comments)

0 comments

No comments yet. Be the first!