Dec 18: Building offline: syncing changes back to the server
Post categories
Chief Product Officer
This is the eighteenth post in the Fastmail Advent 2024 series. The previous post was Dec 17: Building offline: general architecture. The next post is Dec 19: Building offline: mail storage.
Yesterday, we looked at how our offline caching layer fits into our app, and the way it stores data to efficiently respond to JMAP requests. Today, we’ll dive into how it keeps track of changes the user makes while offline, so it can reconcile this with the server.
Keeping track of changes
When a client makes a change offline, we update our local cache and have to keep track of it so we can sync that change back to the server when we come online. There are two main approaches you could take:
- You keep a time-ordered log of every change, then replay the log against the server. One record may appear multiple times in the log if it has multiple modifications applied.
- You keep a set of created/updated/destroyed records, along with the current server value. Each record can only appear once, in at most one of these categories. You calculate the difference between the server state and the current state to update the server.
The benefit of the first approach is it ensures we maintain any ordering dependencies. The benefit of the second approach is it’s more efficient in terms of both storage and synchronisation speed when there are multiple changes made to the same record.
The Fastmail offline cache uses a hybrid of these approaches to try to get the best of both worlds:
- A log stores (in order) the
[data type, account id, id]
of any changes, along with what type of change this is (create/update/destroy). - The record itself stores the last known server state if it’s been updated, stored efficiently as a patch to get back to the server state from the updated state.
- If the record is updated a second time, it:
- stays in its current position in the log if not yet present on the server (this is a create); or
- moves to the end of the log (remove the old entry and add a new one) if it already exists on the server (this is an update/destroy); or
- is removed entirely from the log if the change reverted it back to the last-known server state.
If we’re updating a record that’s not yet been created on the server, we may have to do an update as well as a create, due to an ordering problem. For example, suppose you do the following:
- Create Mailbox X
- Create Emails A & B in Mailbox X
- Create Mailbox Y
- Move Mailbox X to be a child of Y
- Move Email A to be in Mailbox Y
You can’t move (a) later because (b) depends on it. You can’t move (d) earlier because it depends on (c). So if we update a record that’s not yet been created on the server, and we set a property that includes a local id (i.e., it references another object that’s been created locally but not yet synced to the server), we add it as a patch and apply it as an update later.
When loading data from the server, we do not need to look for an entry in the log of changes still to sync. We can just update the server state in the record. If the change is now inert, we’ll delete it from the log when we go to sync it.
For example, suppose we have a mailbox, id 1
, with two messages in it, ids A
& B
, and the user does the following (contrived) actions:
- Creates a new mailbox:
2
- Creates a new child mailbox of that:
3
- Moves A and B into mailbox
3
- Marks B as read
- Moves A back to its original mailbox.
- Renames mailbox
2
.
Our log will end up looking like this:
[Mailbox, "#2", CREATE]
[Mailbox, "#3", CREATE]
[Email, "B", UPDATE]
Because A
is back to its original state, we’ve eliminated it entirely from the log, and do not need to send anything to the server. Because #2
was a create in the log, we do not move it when we renamed it at the end, which is good because otherwise the other changes in the log would both fail as they depend on it. Despite making two changes to B
, we only have to send a single update to the server for it.
Conflicts
Suppose you have a shared contact, let’s call him Joe Bloggs. While offline you edit to add his phone number. Meanwhile, a colleague updates his email address. This means when your client comes back online and synchronises the changes, the object it is updating has already changed. This is called a conflict.
For the data types we have to handle, we believe automatic resolution (rather than presenting the conflict to the user and asking them to choose what should happen) is the right way to go. We follow these simple rules:
- Last write wins.
- All updates are patches.
This means if the same object is updated by two different people, whichever client writes second will overwrite the data of the one that wrote first. (The first client will then sync this change back so you get a consistent state.) However, since all updates are patches, it will merge the changes unless they apply to the same property on the object. So in the case above, although there were two writes to the same contact, they were updating different properties. One user was updating the “emails”, the other the “phones”. So in this case, both changes would be preserved.
Next up, mail storage
In this post we looked at how we store changes you make offline so we can accurately and efficiently sync them back to the server when you come online. Like our discussion of data storage yesterday, everything here applies generically to all data types.
Tomorrow, we’ll discuss why email is special, and what else we do to make this super fast in our offline store.