The lost art of the join: Join a table on itself
Returning from my vacation and catching up with all that happened on Planet MySQL in the meantime I just saw Eric Bergen lamenting about The lost art of the join. I think he's right with his statement as I see lots of beginners (and not only those) struggling with problems that could easily be solved with a simple JOIN. So my invitation goes to all the geeks at Planet MySQL: Let's just give a few nice examples to everybody out there to save the art of the join.
I'll start with a simple but effective one: JOINs in a relational database are typically used to connect different tables together. That's why a lot of people often overlook the fact that many problems can be solved with a JOIN of a table on itself.
Let's pretend we're in charge of a timing system at a car racing circuit. In our system we have a table that stores the elapsed time since the start of the race whenever a car crosses the finish line after a lap. What we want to have in the end is a list of lap only times for every car and every lap. But let's first create the original table.
CREATE TABLE lap_times (
car_id TINYINT,
lap TINYINT,
time TIME,
PRIMARY KEY (car_id, lap)
);
We'll populate the table with some dummy data for two cars and two laps.
INSERT INTO lap_times (car_id, lap, time)
VALUES (1, 1, '00:01:34'),
(1, 2, '00:03:01'),
(2, 1, '00:01:49'),
(2, 2, '00:03:04');
So car number one crossed the finish line the first time after 1'34" and the second time after 3'01" into the race. Car number two is a little bit behind.
But how do we get the timing for each lap now? That's a typical question where I see a lot of PHP and .NET developers starting to hack unnecessary loops into their editors.
What we have to do of course is to subtract the start time for every lap from its end time. We can use the TIMEDIFF function for that. But it needs the start and the end time for a lap on one and the same row: Enter the JOIN.
SELECT e.car_id,
e.lap,
TIMEDIFF(e.time, COALESCE(s.time, '00:00:00')) AS lap_time
FROM lap_times AS e
LEFT JOIN lap_times AS s
ON s.car_id = e.car_id AND s.lap = e.lap - 1
So we just JOIN our table on itself, once for the start (table alias s) and once for the end times (alias e). The first JOIN condition in the ON clause makes sure that only lines for the same car get joined together. The second condition does all the magic: The lap number of the start time always has to be one less than the lap number of the end time for any given lap.
One additional trick lies in the COALESCE: There exists of course no previous lap time for the first lap. To not have to add an additional lap zero with a time of 00:00:00 for every car, we just make sure a nonexistent (NULL) previous lap gets replaced by 00:00:00. (If your system has to deal with missing laps throughout the race you should probably use something like IF(e.lap = 1, '00:00:00', s.time) instead of the COALESCE, if you don't want the following lap to be timed since the beginning but set to NULL instead.)
So here we are with a simple list of lap times from our JOIN of the lap_time table on itself.
+--------+-----+----------+ | car_id | lap | lap_time | +--------+-----+----------+ | 1 | 1 | 00:01:34 | | 1 | 2 | 00:01:27 | | 2 | 1 | 00:01:49 | | 2 | 2 | 00:01:15 | +--------+-----+----------+
This was only a first, simple example on how to use JOINs for other things than the usual connecting of relational tables. But I think it still shows the beauty of declarative programming languages that struck me the first time more than ten years ago when I started to use Prolog (long before SQL).