working around java class/memory leak in GATE 3.0

I’ve developed some patches to GATE to allow an application to work around memory issues resulting from using JAPE in a long-lived VM.

The problem stems from use of the GateClassLoader and the dynamic creation of classes in loading JAPE grammars. Generally, classes are loaded into the Permanent Generation section of the heap, and are only eligible for garbage collection when the classloader from which they were sourced is itself unreachable. As the GateClassLoader is never reset or reinitialized, over time the classes will accumulate and fill the heap, and cause OutOfMemory errors in VM. While a first attempt to resolve this would be to cache the loaded JAPE objects, there are still classes created on each use of that grammar, which will slowly cause the same problem.

The caller needs a facility to reset the class loader, as provided by the patches to src/gate/ (and the compiler classes). It is expected that the caller invokes Gate.resetClassLoader() periodically, either on a fixed frequency or after a set number of uses of the library; our application resets every 15 minutes.

One complication is the difference in time between the generation/loading of the classes for a particular JAPE grammar and the instantiation of objects of those classes. Obviously, the objects needs to be instantiated from a valid class, and in particular from the classloader from which they were created; if the classloader is reset in the middle, there will be instantiation exceptions thrown when using a different classloader. A change was thus made to src/gate/jape/ to retain a reference to the relevant GateClassLoader. Note, that there’s still a race condition present, but the window of time is significantly less than before.

Our (application-level) reset operation resets both the Gate classloader as well as invalidating our JAPE cache: current threads will finish using their JAPE objects, references to the classloader are eventually lost (from gate.Gate via reset and the jape RightHandSide objects through reference), and the garbage collector is allowed to collect the whole shebang.

This patch is much more of a workaround than a fix for the underlying problems. I don’t care if these patches are applied, so much as they’re useful in demonstrating the problem so it can be more appropriately fixed. Really, though, JAPE shouldn’t need to create classes per use.

Also, while I’m at it, here are patches to:

Update, 2006-07-10 I was pressed for detail about my claim that JAPE creates classes on use, and found that I couldn’t. Here’s my reply to gate-users:

I went to reproduce this, and found I can’t. Now I believe I mis-spoke.

The classes I remember being created were…

japeactionclasses.postprocessNewLineActionClass[#] japeactionclasses.postprocessVBnegActionClass[#] japeactionclasses.postprocesssimpleJoinActionClass[#]

…which are from the tokenizer’s grammar. Thinking back to the period I was debugging the issue, I don’t believe I was caching the tokenizers, but I had instituted the other JAPE caching. As well, I didn’t remember that the tokenizer was implemented in terms of JAPE. As such, I tricked myself into believing that the simple use of JAPE grammars from a different processing step was causing the loading of the aforementioned classes, when really it was just not caching the tokenizers.