Using MySQL's ARCHIVE storage engine: Backups and partitions (Part II)

The ARCHIVE engine is a great tool to store large amounts of historical data in a compact way. While I found that it's not yet easy to combine it with another engine used to keep the current data (see the first part of this article on how to achieve something similar to a MERGE or PARTITION with an ARCHIVE table), I was positively surprised on how convenient it is to backup ARCHIVE tables.

Different tables in MySQL need different backup strategies: For some of the engines you can basically just copy the table files when LOCKed and FLUSHed (this can be done with the mysqlhotcopy script for MyISAM), for others (e.g. InnoDB) this won't work at all. The lock-and-copy method would be very convenient to backup large ARCHIVE tables.

To see if this method is safe, we need to have a look at some of the internals of the ARCHIVE storage engine first. So let's start with the different types of files that are associated with every ARCHIVE table in your database directory:

.frm
As for every storage engine (and for VIEWs) this file holds the table description (e.g. column names and types). It's actually not even handled by the storage engine but from within the main server process.
.ARZ
This file holds the actual data for an ARCHIVE table (gzip compressed).
.ARM
A very small file used by the ARCHIVE engine to store some additional meta information, mainly to check if a table has been closed properly and to keep a count of the current number of rows in the table.
.ARN
This file is only created during optimize operations. We don't have to deal with it in backups.

This said, we know that we have to copy three files for every table: .frm, .ARZ and .ARM. But we need to make sure we get clean copies.

As ARCHIVE doesn't support UPDATE, REPLACE and DELETE operations we only have to consider what happens on INSERTs: Every inserted row is first kept in a buffer in RAM. The engine will compress and flush them to disk as soon as a read operation on the table occurs. That way it can handle bulks of records instead of just single rows, what results in much better performance.

The restriction to allow only INSERTs also means that all write operations occur at the very end of the file. This should therefore be the only place where corruption could happen. As said above, the ARCHIVE engine detects tables that haven't been closed properly from a flag in the meta file. It will then do a recovery on the table when it's opened the next time (but be sure to use the latest version of MySQL, probably 5.0.19 that should be released next week, as there might have been a bug regarding this in earlier versions).

This is great news: It means you can just copy an ARCHIVE table without any special precautions. Of course you'll lose all rows currently in the compression buffer. But it's easy to prevent this: Just do a FLUSH TABLES before the copy and the rows in the buffer will be written to disk. So when copying you'll get a snapshot of the table at the time of the FLUSH.

I'd still suggest to get an additional LOCK before the FLUSH on the table as well. This will make sure that no other rows from the compression buffer can be flushed while the file is still being copied. You'll also need this to get consistency across multiple tables backed up at the same time anyway.

So a nice backup strategy for ARCHIVE tables is LOCK-FLUSH-copy, exactly the same as for MyISAM. The most excellent news is, that mysqlhotcopy already copies files of all extensions anyway. So just use it for your ARCHIVE tables as well!