DreamHost Web Hosting -






Message History

I am convinced that message history is a very useful feature of a Jabber client.  The ability to check what someone has told you if they are offline, or just without having to ask them again; to be able to check the date stamp when you told someone something; look up a URL that a friend sent you; forward a message you got to a third party, or to just have a look to try to jog your memory about who someone is and what you have told them is invaluable.  The fact that most Jabber clients have some form of message history database indicates that there is a demand.

However, storing a message history database on the users local hard drive goes against the whole Jabber design.   Things like the contact list data, and vCard information is all stored on the server, which means that you can move to different machines or even different platforms and still seamlessly use your Jabber account in the same way.  Doesn't it make sense to have access to your message history in the same way?  If you go to check an URL that your friend sent you, should you have to think:

"Where was I when I read that message?  On my computer at home?  Or here at work?  Does the Jabber client on my mobile phone even save message history?" 

Since there is no current spec defined to store message history on the Jabber server, I would encourage all client authors to leave this feature out at the present time.  Rather than just designing your own history format, consider if your time would be better spent on designing a server-side history format, and an XML interface that all clients can use to query a history database on the server, independent of formats, platforms or storage capacity on their current device.

Server-side Message History Reasoning

How would it be better on the server?

For a start, from the design of Jabber there is only one machine on the Internet that is guaranteed to see every message that is sent to or from my account, and that is the server. This means that if the messages were to be archived somewhere, the server (or a machine locally attached to it) is the ideal place to store them from a bandwidth point of view. Also from a availability point of view the server again wins hands down. 

There are also great advantages to the user. No matter what client you have logged in from, you would have access to all your message history. At work, at home, from your WAP phone, from a friends computer, from an internet cafe - it makes no difference. In the same way, you would be free to start using a different client without having to worry if the new client will be able to read the message history files written by your old client. This fits in nicely with Jabbers design philosophy I think.

Also, if you are using someone else's PC, you don't leave a message log on their hard drive for anyone to look through later, however when you get home you can still check your message history for anything you said while you were using it!  Servers are generally backed up too, so this solves the problem of loosing your entire message history if your local hard drive dies.

There are other benefits as well. It will become much easier to support message history in a client if the client doesn't have to contain any database code to store and retrieve messages. Code to request this information from the server can be added to JabberCOM or some other library and will be available to all clients to use. The hard design and coding work is done once on the server, and not one each client. So, clients become lighter (use less RAM), and easier to develop. 

Won't it take too much disk space on the server?

Maybe. Lets look at it: 

I am what I would consider a very heavy of ICQ (I have about 100 people on my list) and I use ICQ every day and have done since 1998. I have never cleared my history. My .dat file that stores my message history is currently 16MB. Being text it is very compressible - zipping it brings it down to under 4MB

So best case for a moderate/heavy user is, say 6MB uncompressed - or 1.5MB of real space if it is stored on a compressed volume - per year. Of course add some space for indexing (lets double it at least). Say 5MB compressed per active user, per year.

Also have to factor:
- a large percentage of IM accounts are not active, so space usage will not be increasing
- many people may not opt for message archiving or some or all contacts
- most people won't want to keep more than a years worth of history

I can't really see disk space being a big issue, especially with disk space becoming cheaper each year. Can anyone come up with any better estimates than this?

What about security? 

Security is always tricky. Some people aren't going to be happy with all their private messages stored on someone else's server. But think about it this way - you already have to trust your Jabber admin to not be reading or logging your conversations. Assuming you trust your admin, the additional risk comes in if security on the server is compromised by a third party. Rather than just being able to read what you are writing in real time, the hacker may be able to grab your history file and read everything you have ever said from that account.

Really it comes down to how much you trust your Jabber admin. People trust their money to banks, they upload their data to and they trust their ISP's or Hotmail to store their private emails. This is really no different. Obviously, you should be able to turn off logging for any or all accounts if you do not want to take the risk.

Personally I won't be happy from a security perspective until each message sent though Jabber is encrypted. The messages can then be stored on the logs in encrypted form.

As a Jabber admin I can't afford to supply extra disk space for message history. 

That's fine. It should be an optional feature that can be enabled or disabled at the server. Also the amount of disk space per user should be configurable. You could switch it off entirely. or you could (for example) limit a free account to 5MB, but give all the paying users a 50MB limit. As users reach their limit, the older messages will drop off and be replaced by new messages. This could be used as a value added service to encourage free account users to become subscribers.

What are the other disadvantages?

Obviously there are a few on top of disk usage and security. The speed of reading to the message log is going to be slower as each message has to be retrieved from the server. The server will be under a bit more load both with disk access and bandwidth as it retrieves archived messages and sends them to the client. These problems will become less noticeable each year however as bandwidth and server performance increases. A poor design will not fix itself in a similar way.

This site last modified