October 29, 2013

Apple mail “bug” turns out to be user script after all

Historical

Bron Gondwana

CEO

This is a technical blog post which gives updates on the previous post. There is no need to read this unless you’re particularly interested in IMAP protocol issues and/or gossip.

Of course it’s hard to give updates quite the same level of press that the original post got. The internet does outrage far better than it does corrections.

The OS X Mail team have been really helpful in tracking this down. I put them in direct contact with the user to do further debugging. A very embarassed user discovered an applescript he wrote years ago to move mail from OS X Mail’s “semantic junk” folder to the real Junk folder at FastMail where our bayes trainer could learn from it.

What has changed is that OS X Mail now correctly detects the \Junk special-use on the folder at our server, and sets the semantic trash to be that folder - meaning he was moving messages from that folder to the same folder. Which raises another interesting question:

Could we stop this happening again?

There are three main places I can see that it could have been stopped:

OS X Mail when it detected a “move” from a folder to the same folder
and not performing it
FastMail detecting a copy to the same folder and rejecting it
FastMail detecting that the same message had occurred “too many
times” in the past and rejecting the copy

Interestingly, none of these would have stopped the infinite loopyness. The script was pretty basic, and would have found messages in “Junk Mail” every time it ran.

There’s no way to avoid infinte loops in a sufficiently powerful language. You solve the “halting problem” when the user’s computer catches fire due to excess CPU usage, but otherwise it will just sit there eating power and making the computer feel slow.

With option (1) the reject would have been local, and minimal cost.

Option (2) costs more - because it causes a network round-trip each time, and it breaks theoretically useful usage (duplicating an email… not entirely sure how that’s useful in the case where you KNOW that it’s the same message - but the standard doesn’t say to reject it).

Option (3) is less breakage than option (2), because it only blocks the copy when it detects multiple past copies - but it’s expensive for the server - Cyrus doesn’t keep an index by GUID on the mailbox, so it would have to detect “copy to self” and do a full pass keeping a counter for each message and seeing if there were already “too many copies”. It’s more code and more complexity. It does bring a level of robustness against weird behaviour though.

Who should fix it?

Obviously, I think that client authors should be careful in what they send. They have the most information about the users’ intent, and the most ability to give useful feedback to the user about why the action is being rejected.

But I know it’s tilting at windmills. The OS X Mail team have been fantastic now that I’m in contact with them, but they are one of many clients, and it only takes one.

For now, I’ve put in a syslog statement that tells me how often clients are copying/moving messages to the same folder. The answer is - quite a lot. Some are OS X Mail (including older versions). Some don’t identify themselves, but I’ve found at least Outlook 14 in the “X-Mailer” fields of Sent Messages from the same IP address… so that’s a clue.

I’m guessing I really have to implement some variant of (3) if I want to not break any existing users. Hopefully the OS X Mail team will implement (1).

An aside on “detecting what’s happening at the server”

OS X Mail isn’t using the recently standardised “MOVE” command, it’s using the “COPY” command followed by “EXPUNGE” on the source messages - which is part of the original standard, and works on every server.

If we answered success to the COPY but didn’t actually create new copies of the message, it would mean that the messages would be silently deleted a moment later. Handy in this case, but hardly good behaviour in general! A user could accidentally wipe their entire INBOX.

This is a general problem that I have with IMAP. It’s a very imperative language. Rather than describing what you want done, you send explicit instructions against a very rigid model. Not only that, but there have been tons of extensions. You can use them if they exist - but a server can advertise itself as “IMAP” and support none of them, not even really useful basic things like UIDPLUS. Check this out:

http://imapwiki.org/Specs

Which means that clients either have to have two codepaths, one for servers with the extension, and one for servers without. Either that or just use the “without” codepath for every server.

If clients do have two different codepaths, the chance of bugs is more than doubled, because they will interact in interesting ways. Another issue I debugged this week was an issue with Thunderbird 24.0.1 not noticing that messages had been deleted from the server. The server was sending the right data, but Thunderbird wasn’t recognising it. Turning off support for CONDSTORE fixed it, which is why it “worked” on other servers which didn’t have CONDSTORE support, or didn’t have it switched on.

So the clients try to convert the intentions of the user into changes on the server, and they have to do so through a set of very small commands which each do one thing. There’s no way for the client to say “I need all the following information”, or “here’s the transformation do the data model that the user requested”. And then the server, if it wants to optimise at all, has to guess the intent from the commands as they come in. Some servers try to read ahead to see if the client has pipelined (made multiple requests before waiting for the answers). Our server doesn’t - it just takes advantage of the fact that at least the files are “hot” in memory, so accessing them will be faster next time.

Some examples:

Unread counts? Have to ask for them separately for every folder. If you want to display all the folders with unread messages, you need to first get the full listing of folders, then for each one request the unread count in a separate command.

Want the “first text part of every message”? Good luck - you need to fetch the bodystructure for each, and either combine those for which the text part is the same number into separate batches, or just do one fetch per message.

So it’s not surprising that IMAP clients have bugs. There is some necessary complexity to the problem of accessing email remotely - but plenty of the complexity has been caused by standards which have the client doing too much telling the server “how to do it” rather than “what needs to be done”.