Recover from a table crash using MySQL's binlog

It has been said before, but it can't be stressed enough: The binary log really is your friend. It just saved my life this week after MySQL 5.0 completely crashed a table on an ALTER statement.

The table crash was pretty nasty: A simple ALTER TABLE table_name COMMENT='…' on a MyISAM table rendered all the data useless. While table checks didn't show any problems and the PRIMARY KEY seemed to be okay, the content of all other data columns was just scrambled garbage. It looked like MySQL completely mixed up the pointers to the data fields.

As this table was in place (and probably unaltered) for a pretty long time (dating back to 4.0 or maybe even 3.23), this could have been a compatibility issue (we're on 5.0.18 right now) or maybe a hidden table corruption that didn't show up, even in table checks. So I posted a bug report, that's currently still in state analyzing.

The data I used to restore the table was a mysqlhotcopy backup from the replication slave (see mysqlhotcopy - A Database Backup Program). The slave keeps it's own binlog position and that of the master at the exact time of the backup in a separate table. So the point-in-time restore operation consists of only two steps:

  1. Copy the table files from the backup directory to the live database on the master and the slave
  2. Run all the data manipulating statements for the affected table again from the binlog (starting with the time of the backup, ending with the last statement just before the table crash)

To be able to copy the files while the servers were running, I just LOCKed and FLUSHed that table first, so it was in a clean state and nobody could write to it while the copy operation was in progress. Before UNLOCKing the table again after copying I did another FLUSH to really make sure the server would access the new table files from the backup.

The second step was to restore the data from the time of the backup (which was about 10 hours ago) up to the point of the crash. The binary log was already spread across two files at that time. So I had to extract all the data manipulating statements for the database holding the crashed table from those two binlog files to a text file.

mysqlbinlog --database=db_name --start-position=102655866 mysql1-bin.000312 > restore.sql
mysqlbinlog --database=db_name mysql1-bin.000313 >> restore.sql

The start-position is of course the position of the binlog at the time of the backup. Now I could do a search for all statements affecting the crashed table and feed them to mysql again.

grep -B3 table_name restore.sql | egrep -v '^--$' > restore_table.sql
emacs restore_table.sql
mysql db_name < restore_table.sql

As I knew that all those statements didn't contain any newlines I used a simple approach with grep (the -B3 giving me the lines with the meta information just before the actual statement), quickly checked the resulting file in a text editor (where I deleted the ALTER TABLE statement, too, to not have the crash happen again) and ran the queries.

That's it. The table was now in exactly the same state as it was before the crash.

As we run a replication setup, the binary log has to be activated anyway. But this example shows you once more why it's just best practice to never ever run a MySQL server without the binary log, even if you don't need it for replication.