December 5, 2024

Dec 5: MySQL InnoDB innodb_trx is cached

Open technologies

Technical

Rob Mueller

Founder & CTO

This is the fifth post in the Fastmail Advent 2024 series. The previous post was Dec 4: Meet the team — Bek. The next post is Dec 6: Twoskip and more.

This is a technical post about an aspect of MySQL InnoDB and transaction ids.

One of the features of JMAP is that it allows a client to fetch changes that have occurred on the server since the client last synced with the server. This is done with a sinceState string. Although no specific implementation is required, we’ve found that using a system based on “modification sequences” (aka MODSEQs) as originally described in the IMAP CONDSTORE extension works well.

I have been doing some work internally to allow us to calculate MODSEQs for all JMAP objects stored in our MySQL database. The basic idea was to setup triggers on any INSERT/UPDATE/DELETE actions to update the appropriate MODSEQ on the corresponding table row. The exact structures required for implementing the JMAP /changes method on all database tables is something for another blog post, but this post is more about an unexpected oddity I found while trying to do this.

Conceptually what I need is reasonably straightforward. There is a table to store all the current MODSEQs for each User owned table tuple.

CREATE TABLE UserModSeqs (
  UserId INT NOT NULL,
  TableName VARCHAR(255) NOT NULL,
  CurrentModSeq BIGINT NOT NULL DEFAULT 0,
  HighestPurgedModseq BIGINT NOT NULL DEFAULT 0,
  PRIMARY KEY (UserId, TableName),
  CONSTRAINT UserModSeqsFK FOREIGN KEY (UserId)
    REFERENCES Users (UserId) ON DELETE CASCADE
);

There is a generic “bump” function to increment the current MODSEQ for a particular User/table

CREATE FUNCTION BumpUserModSeq (ModSeqOwnerValue INT, DataTableName VARCHAR(255))
RETURNS BIGINT
BEGIN
  DECLARE NewModSeq BIGINT;

  INSERT INTO UserModSeqs
    (UserId, TableName, CurrentModSeq, HighestPurgedModseq)
  VALUES
    (ModSeqOwnerValue, DataTableName, 1, 0)
  ON DUPLICATE KEY UPDATE
    CurrentModSeq = CurrentModSeq + 1;

  SELECT CurrentModSeq
  FROM UserModSeqs
  WHERE UserId = ModSeqOwnerValue
    AND TableName = DataTableName
  INTO NewModSeq;

  RETURN NewModSeq;
END

And then the actual trigger which looks something like:

CREATE TRIGGER ${table}UpdateModSeq
BEFORE UPDATE ON $table
FOR EACH ROW
BEGIN
  SET NEW.UpdatedModSeq = BumpUserModSeq("UserId", "$table");
END

That you’d create for each $table that’s “owned” by a User (i.e. has foreign key UserId to the Users table).

One of the issues with this is that every single row updated on a table generates a new MODSEQ for each updated row.

I had an idea to make it so we only bump the modseq number once for each transaction rather than each row. Searching around you can find that the information_schema.innodb_trx table has a trx_id field, so a query like:

SELECT trx_id
FROM information_schema.innodb_trx
WHERE trx_mysql_thread_id = connection_id()

Will get the transaction id of your current session, great. So we can create a function like:

CREATE FUNCTION GetCurrentTrxId ()
RETURNS BIGINT UNSIGNED
BEGIN
  DECLARE CurrentTrxId BIGINT UNSIGNED;

  SELECT trx_id
  FROM information_schema.innodb_trx
  WHERE trx_mysql_thread_id = connection_id()
  INTO CurrentTrxId;

  RETURN CurrentTrxId;
END

And then change the “bump” function to:

CREATE FUNCTION BumpUserModSeq (ModSeqOwnerValue INT, DataTableName VARCHAR(255))
RETURNS BIGINT
BEGIN
  DECLARE NewModSeq BIGINT;
  DECLARE CurrentTrxId BIGINT UNSIGNED DEFAULT GetCurrentTrxId();

  IF CurrentTrxId IS NULL OR
     @LastBumpUserModSeqTrxId IS NULL OR
     @LastBumpModSeqOwnerValue IS NULL OR
     @LastBumpDataTableName IS NULL OR
     @LastBumpUserModSeqTrxId <> CurrentTrxId OR
     @LastBumpModSeqOwnerValue <> ModSeqOwnerValue OR
     @LastBumpDataTableName <> DataTableName
    THEN

    INSERT INTO UserModSeqs
      (UserId, TableName, CurrentModSeq, HighestPurgedModseq)
    VALUES
      (ModSeqOwnerValue, DataTableName, 1, 0)
    ON DUPLICATE KEY UPDATE
      CurrentModSeq = CurrentModSeq + 1;

    # InnoDB may only create trx_id on first write (which we just did)
    SET @LastBumpUserModSeqTrxId = IFNULL(CurrentTrxId, GetCurrentTrxId());
    SET @LastBumpModSeqOwnerValue = ModSeqOwnerValue;
    SET @LastBumpDataTableName = DataTableName;
  END IF;

  SELECT CurrentModSeq
  FROM UserModSeqs
  WHERE UserId = ModSeqOwnerValue
    AND TableName = DataTableName
  INTO NewModSeq;

  RETURN NewModSeq;
END

That way we bump the MODSEQ for the given object and user if it’s a new transaction, but use the existing value if we’re in a transaction where we already bumped the MODSEQ.

Now if you try this by hand a bit, it all seems to work great.

But, if you start trying to write tests for this, you start noticing it doesn’t always work as expected. Multiple updates in quick succession don’t bump the modseq correctly.

And so down the rabbit hole you go.

The first thing you learn is that InnoDB has an optimisation where it won’t generate a transaction id until you actually perform either a write statement or a SELECT ... FOR UPDATE (that’s the “InnoDB may only create trx_id on first write (which we just did)” comment in the above code).

mysql (root@127.0.0.1) [fastmail]> begin; select GetCurrentTrxId(); commit;
Query OK, 0 rows affected (0.00 sec)

+-------------------+
| GetCurrentTrxId() |
+-------------------+
|              NULL |
+-------------------+
1 row in set (0.00 sec)

mysql (root@127.0.0.1) [fastmail]> begin; select count(*) from Users where UserId=7 for update into @foo; select GetCurrentTrxId(); commit;
Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

+-------------------+
| GetCurrentTrxId() |
+-------------------+
|           4769402 |
+-------------------+
1 row in set (0.00 sec)

Fine, but then you notice strange behavior like.

mysql (root@127.0.0.1) [fastmail]> begin; select GetCurrentTrxId(); select count(*) from Users where UserId=7 for update into @foo; select GetCurrentTrxId(); commit;
Query OK, 0 rows affected (0.00 sec)

+-------------------+
| GetCurrentTrxId() |
+-------------------+
|              NULL |
+-------------------+
1 row in set (0.00 sec)

Query OK, 1 row affected (0.00 sec)

+-------------------+
| GetCurrentTrxId() |
+-------------------+
|              NULL |
+-------------------+
1 row in set (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

So even though a SELECT ... FOR UPDATE should create a transaction id, the subsequent call to GetCurrentTrxId() still didn’t return one.

But then if you try.

mysql (root@127.0.0.1) [fastmail]> begin; select GetCurrentTrxId(); select sleep(1) into @foo; select count(*) from Users where UserId=7 for update into @foo; select GetCurrentTrxId(); commit;
Query OK, 0 rows affected (0.00 sec)

+-------------------+
| GetCurrentTrxId() |
+-------------------+
|              NULL |
+-------------------+
1 row in set (0.00 sec)

Query OK, 1 row affected (1.00 sec)

Query OK, 1 row affected (0.00 sec)

+-------------------+
| GetCurrentTrxId() |
+-------------------+
|           4770035 |
+-------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

So putting a sleep between them, suddenly you do get a new transaction id.

Digging into MySQL code you end up finding

i_s.cc#L751

/** Common function to fill any of the dynamic tables:
 INFORMATION_SCHEMA.innodb_trx
 @return 0 on success */
static int trx_i_s_common_fill_table(
...
  /* update the cache */
  trx_i_s_cache_start_write(cache);
  trx_i_s_possibly_fetch_data_into_cache(cache);

trx0i_s.cc#L817

int trx_i_s_possibly_fetch_data_into_cache(
    trx_i_s_cache_t *cache) /*!< in/out: cache */
{
  if (!can_cache_be_updated(cache)) {
    return (1);
  }

trx0i_s.cc#L681

static bool can_cache_be_updated(trx_i_s_cache_t *cache) /*!< in: cache */
{
...
  /** The minimum time that a cache must not be updated after it has been
  read for the last time. We use this technique to ensure that SELECTs which
  join several INFORMATION SCHEMA tables read the same version of the cache. */
  constexpr std::chrono::milliseconds cache_min_idle_time{100};

  return std::chrono::steady_clock::now() - cache->last_read.load() >
         cache_min_idle_time;

So the information_schema.innodb_trx table is cached internally for 100ms, so trying to use it to fetch the transaction for the current connection is not guaranteed to be up to date. Ouch! This does not appear to be documented anywhere obvious that I could find.

Unfortunately I can’t see any way to reliably find out if you’re in a transaction or way to identify the current transaction id.

In the end, I had to resort to something at the application level. Internally our DB library encourages you to use a guard object to start and end transactions.

my $committer = $dbh->begin_work_auto();
$dbh->do(...);
$committer->commit();

The ->begin_work_auto() call starts a transaction and returns a guard object. If the $committer guard object is destroyed before the ->commit() call, then it will automatically rollback the transaction in progress.

I added a small bit of code in begin_work_auto() to do $self->do('SET @CurrentTrxId=?', {}, ++$count);, and then set it back to NULL when we commit/rollback. That simulates a guaranteed changing transaction id within each MySQL session as long as you use the begin_work_auto() guard method which most code does when using transactions. Anything that doesn’t use begin_work_auto() leaves @CurrentTrxId as NULL.

The BumpUserModSeq function used in the trigger checks the @CurrentTrxId session variable to see if it’s NULL or has been incremented from it’s last value to decide whether to bump the MODSEQ or not.

So transactions that use begin_work_auto() get optimised MODSEQ bumping (a single update to the modseq table for all rows added/updated/deleted for a user within the transaction), and everything else falls back to still working correctly but with a bit more overhead (an update to the modseq table for every single row added/updated/deleted).