Daily Archives: July 10, 2008

Mail Archive Tool – Improvements for 2.6 (SAK-13584)

Hot on the heels of my MailArchive Performance Improvements for 2.5 to handle large mail sites without chewing up memory and processor, I have pretty much completed an Alpha version of the next round of MailArchive performance improvements based on a database conversion extracting fields from the XML. I was hacking on this in Paris, Barcelona, Cambridge, on the plane to the US and on the bus to Lansing.
Here are the relevant JIRAs:
http://jira.sakaiproject.org/jira/browse/SAK-13584 (2.6 improvements)
http://jira.sakaiproject.org/jira/browse/SAK-11544 (2.5 improvements)
I will check it into a set of branches soon for some more extensive testing across more databases.
Here is a summary of the changes:
– Fully normalized the MAILARCHIVE_MESSAGE table – all fields are in columns now. The XML must stay for a while (more below) but it is not used for any search or selection.
– Added Aaron’s GenericDAO concepts to org.sakaiproject.javax – with some improvements. Order and Restrictions are unchanged. Search has become Query. I added a searchString property to Query and added a few setters which were missing.
– Changed the BaseDbDouble ORM code to support the new genericDAO pattern when it sees a fully normalized table. When it sees a search, it uses a where clause instead of scanning all the data in Java. Also it understand the notion of search fields as well as order by fields. And from the 2.5 modification, paging is implicit throughout as well. Now the BaseDoubleDb can get either a count of messages or a list of messages using Search, Order, and Paging with a single DB query that returns *exactly* the right row set and nothing more.
– Changed the MailArchiveTool to use as little session state as possible. I moved some information into REST-based URLs. Also any heavywieght information is put into Context, never in session. This leaves the MailArchiveTool with a small session state with a few strings and integers – and everything left in Session should be trivially serializable.
This restores all functionality of the MailArchive tool regardless of the size of the message corpus. Each screen display is exactly one SQL statement to pull back *exactly* the required data searched and sorted in the database.
This makes it *REALLY* easy to come up with a really efficient webservice or sdata feed for Mail Archive – since you can search, order, and scope data above the API – you can produce *exactly* the JSON you need to implement a sweet dynamically scroll view of MailArchive where as you scroll off the bottom, the next 100 or so messages come from the server.
This also allows very efficient retrieval of messages for something like the SOLO offline Sakai client that Psybergate has produced with UNISA and NWU or other Google Gears views of a site’s MailArchive Corpus. When SOLO meets SDATA at some future point in time – it might be a shot “heard round the world”.
In general, I am pretty excited about the possibility of moving a legacy tool from being XML-based- to be very efficient and REST friendly. It was not that hard – once I fixed the DB layer to understand Query – I was mostly removing code from the Controller/View code (MailboxAction.java is much smaller now).
I really wanted to get the session to nearly zero and have Restful URLs for everything. But interestingly because the portlet model has this notion of Action and View being separated (often by a redirect on post), it makes it harder to put all of your state into the URL :(. So the compromise is a bit of session (as little as possible and all serializable) use REST urls as much as possible to let the back button work a little better – at least when you go back and click on a different button in a list view – you get the right message :). I need to do more study on this. JSR-168 has some features that make the back button more friendly than the old Jetspeed pattern.
The problem with the back button is really traced more to the Jetspeed 1.0 portlet pattern – it is not something inherent in using Velocity. And interestingly, this is another moment in time where my respect for the thought that went into JSR-168 grows.
I will begin checking this stuff in soon. I need to take a brain break and think about something else for a day or so and then review it before I make all my branches so folks can take a look.

Continue reading