Message History
I am convinced that message history is a very useful
feature of a Jabber client. The ability to check what someone
has told you if they are offline, or just without having to ask
them again; to be able to check the date stamp when you told someone
something; look up a URL that a friend sent you; forward a message
you got to a third party, or to just have a look to try to jog your
memory about who someone is and what you have told them is invaluable.
The fact that most Jabber clients have some form of message history
database indicates that there is a demand.
However, storing a message history database on
the users local hard drive goes against the whole Jabber design.
Things like the contact list data, and vCard information is all
stored on the server, which means that you can move to different
machines or even different platforms and still seamlessly use your
Jabber account in the same way. Doesn't it make sense to have
access to your message history in the same way? If you go
to check an URL that your friend sent you, should you have to think:
"Where was I when I read that message?
On my computer at home? Or here at work? Does the Jabber
client on my mobile phone even save message history?"
Since
there is no current spec defined to store message history
on the Jabber server, I would encourage all client authors
to leave this feature out at the present time. Rather
than just designing your own history format, consider if your
time would be better spent on designing a server-side history
format, and an XML interface that all clients can use to query
a history database on the server, independent of formats,
platforms or storage capacity on their current device. |
Server-side Message History Reasoning
How would it be better on the server?
For a start, from the design of Jabber there is
only one machine on the Internet that is guaranteed to see every
message that is sent to or from my account, and that is the server.
This means that if the messages were to be archived somewhere, the
server (or a machine locally attached to it) is the ideal place
to store them from a bandwidth point of view. Also from a availability
point of view the server again wins hands down.
There are also great advantages to the user. No
matter what client you have logged in from, you would have access
to all your message history. At work, at home, from your WAP phone,
from a friends computer, from an internet cafe - it makes no difference.
In the same way, you would be free to start using a different client
without having to worry if the new client will be able to read the
message history files written by your old client. This fits in nicely
with Jabbers design philosophy I think.
Also, if you are using someone else's PC, you don't
leave a message log on their hard drive for anyone to look through
later, however when you get home you can still check your message
history for anything you said while you were using it! Servers
are generally backed up too, so this solves the problem of loosing
your entire message history if your local hard drive dies.
There are other benefits as well. It will become
much easier to support message history in a client if the client
doesn't have to contain any database code to store and retrieve
messages. Code to request this information from the server can be
added to JabberCOM or some other library and will be available to
all clients to use. The hard design and coding work is done once
on the server, and not one each client. So, clients become lighter
(use less RAM), and easier to develop.
Won't it take too much disk space on the server?
Maybe. Lets look at it:
I am what I would consider a very heavy of ICQ (I have about 100
people on my list) and I use ICQ every day and have done since 1998.
I have never cleared my history. My .dat file that stores my message
history is currently 16MB. Being text it is very compressible -
zipping it brings it down to under 4MB
So best case for a moderate/heavy user is, say 6MB uncompressed
- or 1.5MB of real space if it is stored on a compressed volume
- per year. Of course add some space for indexing (lets double it
at least). Say 5MB compressed per active user, per year.
Also have to factor:
- a large percentage of IM accounts are not active, so space usage
will not be increasing
- many people may not opt for message archiving or some or all contacts
- most people won't want to keep more than a years worth of history
I can't really see disk space being a big issue, especially with
disk space becoming cheaper each year. Can anyone come up with any
better estimates than this?
What about security?
Security is always tricky. Some people aren't going
to be happy with all their private messages stored on someone else's
server. But think about it this way - you already have to trust
your Jabber admin to not be reading or logging your conversations.
Assuming you trust your admin, the additional risk comes in if security
on the server is compromised by a third party. Rather than just
being able to read what you are writing in real time, the hacker
may be able to grab your history file and read everything you have
ever said from that account.
Really it comes down to how much you trust your
Jabber admin. People trust their money to banks, they upload their
data to xdrive.com and they trust their ISP's or Hotmail to store
their private emails. This is really no different. Obviously, you
should be able to turn off logging for any or all accounts if you
do not want to take the risk.
Personally I won't be happy from a security perspective
until each message sent though Jabber is encrypted. The messages
can then be stored on the logs in encrypted form.
As a Jabber admin I can't afford to supply
extra disk space for message history.
That's fine. It should be an optional feature that
can be enabled or disabled at the server. Also the amount of disk
space per user should be configurable. You could switch it off entirely.
or you could (for example) limit a free account to 5MB, but give
all the paying users a 50MB limit. As users reach their limit, the
older messages will drop off and be replaced by new messages. This
could be used as a value added service to encourage free account
users to become subscribers.
What are the other disadvantages?
Obviously there are a few on top of disk usage
and security. The speed of reading to the message log is going to
be slower as each message has to be retrieved from the server. The
server will be under a bit more load both with disk access and bandwidth
as it retrieves archived messages and sends them to the client.
These problems will become less noticeable each year however as
bandwidth and server performance increases. A poor design will not
fix itself in a similar way. |