IMAP is a great way of getting at email. All your email stays on the server, and you can access it from multiple devices (and all devices see the same email). This is in contrast to POP, where email is generally deleted from a server, and it’s all but impossible to get a syncronised view of your email across different clients. IMAP also lets you create folders on the mail server. Ever since I set up a unix-based (FreeBSD) mail server in September 2001, we’ve used the default standard of uw-imap (University of Washington IMAP server).

uw-imap supports what’s known as “mbox” mail format. That’s where a mail folder (such as your Inbox) is actually just one file, with all your emails in it sequentially. It’s difficult to have subfolders with mbox: you have to create actual directories on the mail server (which you can’t do with most email clients) into which to put further mbox-format files. mbox is very simple, but also very flawed:

  • When an email client syncs to the mail server, uw-imap must trawl through potentially vast files in order to generate a list of messages.
  • Incremental backing up becomes very difficult, because even just marking a message as “read” in a mail folder means the file has changed. So a 1 byte change means a hundred meg file must be backed up.

I found a new way to back up mail folders earlier this year (using xdelta to create binary diffs of tar’d home folders). This meant backups used a fraction of the space, and hence I could remove the somewhat arbitrary 50MB mail quota per user. Unfortunately, everyone’s home directory is now so large that disk usage is incredibly high – 40 people all logged in trying to check 100MB inboxes thrashes the disks, even on a fast RAID 10 setup.

I first heard about dovecot earlier this year. For all of uw-imap’s failings, I was always too scared of other IMAP servers (courier and cyrus, basically) to switch. The lack of documentation and weird mailbox format put me off cyrus, and I couldn’t quite get my head around courier. Pathetic I know, but I’d never hit the limits of uw so didn’t think we needed to change.

Dovecot, however, is excellent. Even sticking with mbox format mailboxes, it will vastly improve matters by creating an index of messages which means there’s no need to look at the actual mailbox file when syncing (except to grab out new messages). But there’s a mailbox format much better than mbox, and it’s what Courier uses – luckily, dovecot supports it too: Maildir.

A maildir is actually a directory containing three other directories: new, cur and tmp. I’ll leave explaining the precise mechanics to wikipedia. Suffice to say that with Maildir:

  • Every email is a single file, whose suffix represents the current state of the email (unread, deleted, etc)
  • There is a separate UIDlist and index of messages, speeding up the work of delivery and syncing greatly over mbox
  • You can create hierarchical mail folders from email clients: if “Work” is a mail folder, then “Work.bobby” is presented to the mail client as “mail folder Work, with mail folder bobby as a subfolder”.

There is an argument that says that, for usability purposes, you shouldn’t be able to have a folder that contains both mail and other mail folders. I don’t think it’s going to confuse anyone here, though.

So, on Sunday 2nd Dec, I will be switching off our email for a bit and converting over 7GB of email from mbox to maildir format. This is no mean feat, since I haven’t found a decent mbox->maildir converter. The closest I’ve come is mb2md, but whilst it claims to read the “.mailboxlist” files to see which of a user’s files to convert, I haven’t got it to work yet. So I wrote my own, utilising maildirmake (comes free with Courier) and mb2md wrapped up in a load of perl. I’ll post it when it’s finished.

The other fantastic thing I’ve found is maildrop. This is like procmail, but vastly better: you get proper logic, it has a few basic functions built in (e.g. “does any “To” or “Cc” line have this address?”), and it just feels nicer to programme. We don’t have too many complicated procmail scripts, so it’s been easy to rewrite them in maildrop format.

Once dovecot is up and running, and everyone’s converted to maildirs, perhaps then it won’t take 2 minutes to check for new mail, as it sometimes does now. Poor, poor disks.