Demystifying MySQL Replication Crash SafetyБазы данных и системы хранения
Jean-François is a System/Infrastructure Engineer and MySQL Expert. He recently joined MessageBird, an IT telco startup in Amsterdam, with the mission of scaling the MySQL infrastructure. Before that, J-F worked on growing the Booking.com MySQL and MariaDB installations including dealing with replication bottlenecks (he also works on many other non MySQL related projects that are less relevant here). Some of his latest projects are making Parallel Replication run faster and promoting Binlog Servers. He also has a good understanding of replication in general and a respectable understanding of InnoDB, MySQL, Linux and TCP/IP. Before Booking.com, he worked as a System/Network/Storage Administrator in a Linux/VMWare environment, as an Architect for a Mobile Services Provider, and as a C and Java Programmer in an IT Service Company. Even before that, when he was learning computer science, Jeff studied cache consistency in distributed systems and network group communication protocols.
Up to MySQL 5.5, replication was not crash safe: it would fail with “dup.key” or “not found” error (or data corruption). So 5.6 is better, right? Maybe: it is possible, but not the default. MySQL 5.7 is not much better, 8.0 has safer defaults but it is still easy to get things wrong.
Crash safety is impacted by positioning (File+Pos or GTID), type (single/multi-threaded), MTS settings (Db/Logical Clock, and preserve commit order), the sync-ing of relay logs, the presence of binlogs, log-slave-updates and their sync-ing. This is complicated and even the manual is confused about it.
In this talk, I will explain above with details on replication internals, so you might learn a thing or two.