For example, with this table:

CREATE TABLE daily_events (
  created_on DATE NOT NULL,
  last_event_id INT(11) UNSIGNED NOT NULL,
  last_event_created_at DATETIME NOT NULL,
  PRIMARY KEY (created_on)
);

The purpose of this table is to provide a daily summary of events storing the id and created_at timestamp of the most recent event for the day. When a new event is processed the daily summary table should also be updated, an operation which is ideally suited to an INSERT ON DUPLICATE KEY UPDATE query.

Simple conditional updates

Unfortunately there’s a catch: the app can process events out-of-order meaning updates to the summary table won’t necessarily occur in the order that the events are created. This means that the last_event_id and last_event_created_at fields should only be updated if the event is newer than the last one for the day. In SQL terms it should look like this:

INSERT INTO daily_events (created_on, last_event_id, last_event_created_at)
VALUES ('2010-01-19', 23, '2010-01-19 10:23:11')
ON DUPLICATE KEY UPDATE
last_event_id = VALUES(last_event_id),
last_event_created_at = VALUES(last_event_created_at)
WHERE last_event_created_at < VALUES(last_event_created_at);

The bad news is that this won’t work as MySQL doesn’t allow WHERE clauses in the update portion of the query. In a simple case like this the easiest workaround is to use the IF function:

INSERT INTO daily_events (created_on, last_event_id, last_event_created_at)
VALUES ('2010-01-19', 23, '2010-01-19 10:23:11')
ON DUPLICATE KEY UPDATE
last_event_id = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_id), last_event_id),
last_event_created_at = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_created_at), last_event_created_at);

This works by checking if the last_event_created_at timestamp of the event being updated is newer than the current timestamp, if it is then the new value is assigned to the field in the update, otherwise the current value is used.

An important thing to keep in mind when using this approach is that the order in which you update your fields is very important. I was wrongly under the impression that the updates took place in one mass-assignment after the entire query had been interpreted by MySQL. But they’re not: the assignments happen in the order they appear in the query. To give you an example, this query won’t produce the expected result:

INSERT INTO daily_events (created_on, last_event_id, last_event_created_at)
VALUES ('2010-01-19', 23, '2010-01-19 10:23:11')
ON DUPLICATE KEY UPDATE
last_event_created_at = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_created_at), last_event_created_at),
last_event_id = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_id), last_event_id);

When the update is executed with a more recent event, the last_event_created_at field will be updated, but the last_event_id field won’t. This is because when the second IF is evaluated last_event_created_at has already been updated so that last_event_created_at is equal to VALUES(last_event_created_at). Crazy huh?!

Conditions using multiple fields

Keeping in mind the importance of the order in which fields are updated, conditions that involve multiple fields are a little trickier. For example, a slightly different summary table could look like this:

CREATE TABLE last_session_user_events (
  user_id INT(11) UNSIGNED NOT NULL,
  last_session_id INT(11) UNSIGNED NOT NULL,
  last_session_created_at DATETIME NOT NULL,
  last_event_id INT(11) UNSIGNED NOT NULL,
  last_event_created_at DATETIME NOT NULL,
  PRIMARY KEY (user_id)
);

This summary table tracks the most recent event for the most recent session for a user. Updates to this table should only be made if the session is newer, or if the session is the same and the event is newer. Again events can be processed out of order and it is possible for multiple sessions to be active at the same time. The query to insert/update looks like this:

INSERT INTO last_session_user_events (user_id, last_session_id, last_session_created_at, last_event_id, last_event_created_at)
VALUES (6, 45, '2010-01-18 23:31:44', 453, '2010-01-19 10:23:11')
ON DUPLICATE KEY UPDATE
last_session_id = IF((last_session_created_at < VALUES(last_session_created_at) OR (last_session_id = VALUES(last_session_id) AND last_event_created_at < VALUES(last_event_created_at))), VALUES(last_session_id), last_session_id),
last_session_created_at = IF((last_session_created_at < VALUES(last_session_created_at) OR (last_session_id = VALUES(last_session_id) AND
last_event_created_at < VALUES(last_event_created_at))), VALUES(last_session_created_at), last_session_created_at),
last_event_id = IF((last_session_created_at < VALUES(last_session_created_at) OR (last_session_id = VALUES(last_session_id) AND last_event_created_at < VALUES(last_event_created_at))), VALUES(last_event_id), last_event_id),
last_event_created_at = IF((last_session_created_at < VALUES(last_session_created_at) OR (last_session_id = VALUES(last_session_id) AND last_event_created_at < VALUES(last_event_created_at))), VALUES(last_event_created_at), last_event_created_at);

Aside from the fact this looks a bit crazy (people who obsess about DRY code will be weeping) it also suffers from the order of evaluation problem I described earlier. In certain situations the result of the IF statement will change within a single query as the updates are made, leading to inconsistent data. To make this work it is necessary evaluate the IF condition once, before any updates occur, and to store the result in a user defined variable that can be referenced in the rest of the query. Here’s the updated query:

INSERT INTO last_session_user_events (user_id, last_session_id, last_session_created_at, last_event_id, last_event_created_at)
VALUES (6, 45, '2010-01-18 23:31:44', 453, '2010-01-19 10:23:11')
ON DUPLICATE KEY UPDATE
last_session_id = IF((@update_record := (last_session_created_at < VALUES(last_session_created_at) OR (last_session_id = VALUES(last_session_id) AND last_event_created_at < VALUES(last_event_created_at)))), VALUES(last_session_id), last_session_id),
last_session_created_at = IF(@update_record, VALUES(last_session_created_at), last_session_created_at),
last_event_id = IF(@update_record, VALUES(last_event_id), last_event_id),
last_event_created_at = IF(@update_record, VALUES(last_event_created_at), last_event_created_at);

As you can see, not only does this ensure the IF statement remains consistent throughout the update, it also cleans up the query making it a little easier on the eyes.

This is just one of many MySQLisms we’ve had to deal with over the last 12 months, no doubt we’ll blog about more of them in the future. Our lawyers have asked us to point out that no DBAs were harmed in the making of the post!