Safe transition to MySQL 5.0: Mind the reserved words
I'm sure you always quote all your table and column identifiers as I do. And so do all of your co-workers. Even in that tiny little script that was only meant to be a short-time hack two years ago. Then of course this article doesn't apply to you. For sure.
The update from MySQL 4.1 to 5.0 is pretty straightforward. Of course you should study the list of changes and especially the notes on how to upgrade carefully. There are some incompatible changes noted in that documentation. The different handling of some JOINs is just one example. But to be honest, for most of the part upgrading from 4.1 to 5.0 is as easy as switching the binaries (which means only changing a symbolic link if your installation is properly set up), restarting the server and running mysql_fix_privilege_tables.
There is one pitfall however, not being noted prominently in the documentation. MySQL 5.0 gives us a lot of new functionality. This comes of course with new reserved words and functions in the SQL syntax, 81 to be exact, as I found out playing around with the GNU text utilities and the lex.h files from the MySQL source code distribution for 4.1 and 5.0 (that's where the strings for the reserved words and functions are defined):
ALGORITHM ASENSITIVE CALL CASCADED CHAIN COMPACT CONDITION CONNECTION CONTINUE CURSOR DECLARE DEFINER DETERMINISTIC EACH ELSEIF EXIT FETCH FOUND FRAC_SECOND GOTO INOUT INSENSITIVE INVOKER ITERATE LABEL LANGUAGE LEAVE LOOP MAX_USER_CONNECTIONS MERGE MIGRATE MODIFIES MUTEX NAME NAME_CONST ONE OUT PHASE READS RECOVER REDUNDANT RELEASE RESUME RETURN ROUTINE ROW_COUNT SCHEMA SCHEMAS SECURITY SENSITIVE SLEEP SPECIFIC SQL SQLEXCEPTION SQLSTATE SQLWARNING SQL_TSI_DAY SQL_TSI_FRAC_SECOND SQL_TSI_HOUR SQL_TSI_MINUTE SQL_TSI_MONTH SQL_TSI_QUARTER SQL_TSI_SECOND SQL_TSI_WEEK SQL_TSI_YEAR STDDEV_POP STDDEV_SAMP SUSPEND TEMPTABLE TIMESTAMPADD TIMESTAMPDIFF TRIGGER TRIGGERS UNDEFINED UNDO UNKNOWN VAR_POP VAR_SAMP VIEW WHILE XA
MySQL is quite forgiving about reserved words and function names used as identifiers (see Treatment of Reserved Words in MySQL): If you run your server without the IGNORE_SPACE SQL mode (and this is still the default in 5.0), function names aren't even treated as reserved words, as they can only be functions if they're directly followed by something enclosed in parentheses. Additionally MySQL knows that anything following a period in a qualified name (e.g. database.column) must be an identifier, so no quotes needed here. The query parser even treats some commonly used words (such as BIT, DATE, TIME or TEXT) specially and allows them unquoted in places where they can unambiguously be recognized as identifiers.
This sounds pretty safe actually, but still, if you use any of the above strings as identifiers in one of your databases, you should at least be careful while upgrading. The following simple question by a data extraction tool to retrieve a pre-configured SQL query stored in the database itself worked in 4.1 but fails in 5.0 due to SQL being a new keyword:
SELECT sql FROM data_sources WHERE data_source_id = 1 ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'sql FROM data_sources WHERE data_source_id = 1' at line 1
The query should instead have been properly quoted to still work in 5.0 (What of course everybody does. Anyway. All the time. As we agreed on at the beginning of the article.):
SELECT `sql` FROM `data_sources` WHERE `data_source_id` = 1
A missing space could have a similar effect. What was accepted by the 4.1 parser without a hassle will be rejected by 5.0:
CREATE TABLE var_pop(i INT) ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'var_pop(i INT)' at line 1
Here the parentheses directly following the identifier var_pop irritate the parser as VAR_POP is a new function in 5.0 (the standard SQL alias for MySQL's VARIANCE function).
Therefore, despite the very straightforward upgrade path to 5.0, I'd recommend the following steps in a bigger production setup, just for safety:
- Study (of course) the list of changes and the notes on how to upgrade as pointed out above and on the download page.
- Read Treatment of Reserved Words in MySQL and this article and check for additional identifiers in your schemata and application code that possibly need to be quoted.
- Activate the general query log and look for anything that needs special attention (using
grep,sed,awkor simple Perl scripts, as the general query log grows pretty big in a very short amount of time on a busy server). - After all the issues found while working through the steps above have been fixed: Check your key applications separately in a 5.0 testing environment.
- In a master/slave environment, upgrade your slaves well before the master and direct as many of the read-only queries as possible to your new 5.0 slaves. This way you can get a good impression on how your applications work with 5.0 in production and still have a safe path back in case anything breaks (just direct the affected application back to the 4.1 master until fixed). That's actually what we did one month ago, just after the first release candidate was available.
- Finally upgrade your master from 4.1 to 5.0.
This procedure probably saved us from breaking any application while upgrading. Because, guess what... the first example above, the one with the sql identifier, was not just made up. It was in one of our own programs. Embarrassingly coded by... myself.