December 1, 2017

Why Object Storage is the Future of Storage for Email for Service Providers (featuring Deutsche Telekom)

The 6th of November saw the start of the OpenStack Summit in Sydney and with it a great talk entitled Email Storage with Ceph by Deutsche Telekom about what they’ve been doing in the email object storage world. But first, perhaps a little background…

For years, the biggest issue and expense faced by anyone running their own email system has been storage costs, as more and more users are using IMAP, this compounds the need to increase server side storage and the amount of email storage needed is forever increasing. This translates into running large storage arrays and, depending on your setup, having to deal with Network File System (NFS) and other utilities as well. This can be expensive, not just in terms of money, but time as well.

So how do we solve this issue of the need for ever increasing storage?

In 1996, now 21 years ago, Carnegie Mellon University was looking into exactly that. The idea for object storage came about initially in 1995. The concept was simple..ish, splitting less common operations, such as namespace manipulations, from common operations (such as reads and writes), to optimise the performance and scale of both. The other notable concept being abstracting the writes and reads of data to more flexible data containers (objects).

Object storage is great to use for files (such as log files) that you may need to keep for years and it’s now being offered by multiple companies. (We’ve even recently released an object storage plug-in to tap into this growing need for object storage).

However, this is just part of the puzzle. How how do we use object storage to store our emails?

 

Data storage - object storage - email - OpenStack Summit

 

In order to store emails within object storage, we need to tell our LDA (Dovecot) where to find these files if we were to place them into object storage, which requires a fair bit of work modifying Dovecot source code. The way that we can change where the emails are actually stored comes down to our Dovecot index files, which tells Dovecot where to look to grab the requested data.

Now that we’ve gone through a little background, we can get back to why it was such a great (albeit brief) presentation by Deutsche Telekom and how this ties into OpenStack and atmail.

Inside the Dovecot index file, the information is stored as a Universally unique identifier UUID) in hex Object Identifier (OID). The OID is stored as an extension record within Dovecot. This lets Dovecot know where to fetch the email if the client requested to look at it on their email client. To add to this, the Dovecot metadata and index files are themselves within CephFS – that is kind of like turning Ceph into an NFS mount on your system. Any immutable attributes such as GUID, POP3 UIDL, POP3 order, mail UID and received and saved date are all stored in RADOS as XAttributes. Experts suggest that your CephFS pool is built with SSDs to deal with a lot more writes being performed on the Dovecot index files.

Having to keep up with the expanding storage demand of users is something that has plagued administrators for years. So, the ability to replace NFS with CephFS is a huge bonus, especially if you’re just an end user of OpenStack. It means that you’re able to place data within CephFS without having to worry about disk space at all!

One of the better things about the entire project that Deutsche Telekom been working on, along with help from Wido den Hollander (42on) and Tallence for initial development, is that it’s open-source. This means that atmail, Deutsche Telekom and the wider open-source community will be able to continue to develop and add to the project, making it better for everyone involved.

This project has been under development for at least a year at this point and although it is still classed as being in “heavy development” (although the code is also in a tested state), it’s apparently not entirely production-ready yet. The good news is that that doesn’t stop you, myself or anyone else running up their own experimental, non-production implementations to help out the project. This can be helpful in a number of ways and you may run into a bug that for some reason wouldn’t have been identified before. The more people who take the time to test and check this out, the more stable everyone within the community can try to make it.

This is exactly what we plan to do at atmail, in an effort to give back to an open-source community that we all rely on so much, without even realising it at times!

 

 

Share This Post