Going for the Gusto in Sakai’s Storage Layer SAK-12387

As I have been blasting my way through improving the performance of the MailTool – it is pretty clear that we need to improve our APIs for things like Messages to allow tool writers (and Web Service consumers) to Filter, Search, and Page on pretty much any call that returns a List.
Otherwise we just won’t scale and we won’t be cool at all. The Mail Archive tool keeling over just happens to be the first place our lack of rich search, filter, and paging options in our APIs makes us look very uncool and unscalable.
So I whipped up a JIRA to talk about it. Actually I am already about 25% done with this because I had to knock the cobwebs out of BaseMessage and Storage APIs to make MailArchive work well. Now I want to do it everywhere. I don’t want to make all the implementations super-fast like Mail will be (actually mail will be superfast most of the time and tolerably slow on searches). Most importantly Mail Archive will not crash servers and tick off impatient users.
Here is the JIRA:
http://bugs.sakaiproject.org/jira/browse/SAK-12837
I include the text of the JIRA below. The people who mostly care about this are Ian, Beth, Glenn, Zhen, Jim, and me. These are the folks who own tools that are using the old DbDouble and DbFlat series of self-built ORMs from the CHEF era.


The basic idea here is to improve the performance of the Message series of APIs and the Storage series of implementations. This is a high level view of this work:
– Add methods to the Message Apis which expose the capabilities to Page, Filter, Sort, and Search through messages
– Change the Storage API to allow these structures to be passed all the way down to the Storage layer.
– Make a Search filter which is a special case of a org.sakaiproject.javax.Filter – this way Storage layers can separately optimize search if they prefer.
A search is subtly different from a filter. Depending on implementation, a search might be optimized to use a LIKE clause in SQL or even consult a Lucene-like index to search entities. A filter is a very precise mechanism which retrieves and parses every entity – presenting it to the filter and allowing the filter to decide if the message is to be included. When searching, ascending or descending order may not be date – it might be relevance depending on impementation of that particular Storage. It is also possible that some implementations will simply treat search as a special case of filter and still retrieve all messages and do the search on parsed messages. Keeping these methods separate means that it is *possible* to separately optimize these operations separately.
Once this mostly interface is done, we have many ways to improve performance of large structures which are stored in Storage backed stores and then as the stores improve their implementations, we can teach the tools to call the more efficient methods over time.
Even if Storage layers do some simple optimization, such as reading all the messages and using search/filer to discard them in the storage layer instead of discarding them in the tool – this will save tons of in-memory footprint when large messages sets are returned – just to page through messages 15-21. We can gain a lot of benefit to memory footprint with very simple changes to the Storage layer.
While this sounds complex and invasive – it actually is not so bad. For Message – we add new methods and supply inefficient but completely functional implementations – so anything that extends BaseMessage is fine and things that make calls to the new methods work no worse than they did before.
When we add methods to Storage – we will have to stub the newly missing methods out in all of the implementations. We can provide functional, inefficient implementations to each of these new methods. Another alternative is simply throw a run-time error in some of the the implementations – then when a tool writer starts making calls to the new high level methods which make calls to the new Storage methods – they will encounter the exceptions and add/debug the storage code. We will provide implementations for all of the utility classes in the db directory (i.e. DbFlat, DbDouble, etc).
This allows us to start paging and searching in SQL and perhaps even add Lucene to the mix for searching Entities. Ultimately this might lead the way to a general implementation of Message API using JSR-170. By adding these methods and evolving tools to cal the efficient methods – we pre-tune tool code to anticipate JSR-170 based Message APIs.
This primarily affects tools that use the DbFlat and DbDouble series of Storage Mechanisms (directory db) and tools that use the generic Message API (directory message).
A non-inclusive tool list is Announcements, Mail Archive, MOTD, deprecated Chat, deprecated Discussion, and others. Resources may be affected – but the storage implementation of Resources has already begun to move away from the DbDouble patterns and uses an increasingly independent storage layer as it moves toward talking to JSR-170 directly. The only likely impact on Resources may e to add a few more method signatures for the Storage class.
This will need careful QA testing and awareness of all the developers of these tools to keep an eye out for issues that might arise due to these changes.